1 /*
2 * Licensed to the Apache Software Foundation (ASF) under one or more
3 * contributor license agreements. See the NOTICE file distributed with
4 * this work for additional information regarding copyright ownership.
5 * The ASF licenses this file to You under the Apache License, Version 2.0
6 * (the "License"); you may not use this file except in compliance with
7 * the License. You may obtain a copy of the License at
8 *
9 * http://www.apache.org/licenses/LICENSE-2.0
10 *
11 * Unless required by applicable law or agreed to in writing, software
12 * distributed under the License is distributed on an "AS IS" BASIS,
13 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14 * See the License for the specific language governing permissions and
15 * limitations under the License.
16 */
17
18 package org.apache.commons.numbers.arrays;
19
20 /**
21 * Select indices in array data.
22 *
23 * <p>Arranges elements such that indices {@code k} correspond to their correctly
24 * sorted value in the equivalent fully sorted array. For all indices {@code k}
25 * and any index {@code i}:
26 *
27 * <pre>{@code
28 * data[i < k] <= data[k] <= data[k < i]
29 * }</pre>
30 *
31 * <p>Examples:
32 *
33 * <pre>
34 * data [0, 1, 2, 1, 2, 5, 2, 3, 3, 6, 7, 7, 7, 7]
35 *
36 *
37 * k=4 : [0, 2, 1, 1], [2], [6, 3, 2, 3, 5, 7, 7, 7, 7]
38 * k=4,8 : [0, 1, 2, 1], [2], [3, 3, 2], [5], [7, 7, 6, 7, 7]
39 * </pre>
40 *
41 * <p>Notes:
42 *
43 * <ul>
44 * <li>The algorithm does not require that all data with a value equal to the target index
45 * are ordered consecutively in a range containing the target index.
46 * <li>The algorithm may reorder any part of the array above and below the target indices.
47 * <li>Correct usage for multiple target indices should not call multiple times with each
48 * index but instead call selection only once with all indices. Use of consecutive calls
49 * can reorder previously selected indices.
50 * </ul>
51 *
52 * <p>Selection on multiple indices will handle duplicate and
53 * unordered indices. The method detects ordered indices (with or without duplicates) and
54 * uses this during processing. Passing ordered indices is recommended if the order is already
55 * known; for example using uniform spacing through the array data, or to select the top and
56 * bottom {@code n} values from the data.
57 *
58 * <p>Implementation details
59 *
60 * <p>A quickselect adaptive method is used for single indices. This uses analysis of the
61 * partition sizes after each division to update the algorithm mode. If the partition
62 * containing the target does not sufficiently reduce in size then the algorithm is
63 * progressively changed to use partitions with guaranteed margins. This ensures a set fraction
64 * of data is eliminated each step and worse-case linear run time performance. This method can
65 * handle a range of indices {@code [ka, kb]} with a small separation by targeting the start of
66 * the range {@code ka} and then selecting the remaining elements {@code (ka, kb]} that are at
67 * the edge of the partition bounded by {@code ka}.
68 *
69 * <p>Multiple keys are partitioned collectively using an introsort method which only recurses
70 * into partitions containing indices. Excess recursion will trigger use of a heapselect
71 * on the remaining range of indices ensuring non-quadratic worse case performance. Any
72 * partition containing a single index, adjacent pair of indices, or range of indices with a
73 * small separation will use the quickselect adaptive method for single keys. Note that the
74 * maximum number of times that {@code n} indices can be split is {@code n - 1} before all
75 * indices are handled as singles.
76 *
77 * <p>Floating-point order
78 *
79 * <p>The {@code <} relation does not impose a total order on all floating-point values.
80 * This class respects the ordering imposed by {@link Double#compare(double, double)}.
81 * {@code -0.0} is treated as less than value {@code 0.0}; {@code Double.NaN} is
82 * considered greater than any other value; and all {@code Double.NaN} values are
83 * considered equal.
84 *
85 * <p>References
86 *
87 * <p>Quickselect is introduced in Hoare [1]. This selects an element {@code k} from {@code n}
88 * using repeat division of the data around a partition element, recursing into the
89 * partition that contains {@code k}.
90 *
91 * <p>Introsort/select is introduced in Musser [2]. This detects excess recursion in
92 * quicksort/select and reverts to a heapsort or linear select to achieve an improved worst
93 * case bound.
94 *
95 * <p>Use of sampling to identify a pivot that places {@code k} in the smaller partition is
96 * performed in the SELECT algorithm of Floyd and Rivest [3, 4].
97 *
98 * <p>A worst-case linear time algorithm PICK is described in Blum <i>et al</i> [5]. This uses
99 * the median of medians as a partition element for selection which ensures a minimum fraction of
100 * the elements are eliminated per iteration. This was extended to use an asymmetric pivot choice
101 * with efficient placement of the medians sample location in the QuickselectAdpative algorithm of
102 * Alexandrescu [6].
103 *
104 * <ol>
105 * <li>Hoare (1961)
106 * Algorithm 65: Find
107 * <a href="https://doi.org/10.1145%2F366622.366647">Comm. ACM. 4 (7): 321–322</a></li>
108 * <li>Musser (1999)
109 * Introspective Sorting and Selection Algorithms
110 * <a href="https://doi.org/10.1002/(SICI)1097-024X(199708)27:8%3C983::AID-SPE117%3E3.0.CO;2-%23">
111 * Software: Practice and Experience 27, 983-993.</a></li>
112 * <li>Floyd and Rivest (1975)
113 * Algorithm 489: The Algorithm SELECT—for Finding the ith Smallest of n elements.
114 * Comm. ACM. 18 (3): 173.</li>
115 * <li>Kiwiel (2005)
116 * On Floyd and Rivest's SELECT algorithm.
117 * Theoretical Computer Science 347, 214-238.</li>
118 * <li>Blum, Floyd, Pratt, Rivest, and Tarjan (1973)
119 * Time bounds for selection.
120 * <a href="https://doi.org/10.1016%2FS0022-0000%2873%2980033-9">
121 * Journal of Computer and System Sciences. 7 (4): 448–461</a>.</li>
122 * <li>Alexandrescu (2016)
123 * Fast Deterministic Selection
124 * <a href="https://arxiv.org/abs/1606.00484">arXiv:1606.00484</a>.</li>
125 * <li><a href="https://en.wikipedia.org/wiki/Quickselect">Quickselect (Wikipedia)</a></li>
126 * <li><a href="https://en.wikipedia.org/wiki/Introsort">Introsort (Wikipedia)</a></li>
127 * <li><a href="https://en.wikipedia.org/wiki/Introselect">Introselect (Wikipedia)</a></li>
128 * <li><a href="https://en.wikipedia.org/wiki/Floyd%E2%80%93Rivest_algorithm">Floyd-Rivest algorithm (Wikipedia)</a></li>
129 * <li><a href="https://en.wikipedia.org/wiki/Median_of_medians">Median of medians (Wikipedia)</a></li>
130 * </ol>
131 *
132 * @since 1.2
133 */
134 public final class Selection {
135
136 /** No instances. */
137 private Selection() {}
138
139 /**
140 * Partition the array such that index {@code k} corresponds to its correctly
141 * sorted value in the equivalent fully sorted array.
142 *
143 * @param a Values.
144 * @param k Index.
145 * @throws IndexOutOfBoundsException if index {@code k} is not within the
146 * sub-range {@code [0, a.length)}
147 */
148 public static void select(double[] a, int k) {
149 IndexSupport.checkIndex(0, a.length, k);
150 doSelect(a, 0, a.length, k);
151 }
152
153 /**
154 * Partition the array such that indices {@code k} correspond to their correctly
155 * sorted value in the equivalent fully sorted array.
156 *
157 * @param a Values.
158 * @param k Indices (may be destructively modified).
159 * @throws IndexOutOfBoundsException if any index {@code k} is not within the
160 * sub-range {@code [0, a.length)}
161 */
162 public static void select(double[] a, int[] k) {
163 IndexSupport.checkIndices(0, a.length, k);
164 doSelect(a, 0, a.length, k);
165 }
166
167 /**
168 * Partition the array such that index {@code k} corresponds to its correctly
169 * sorted value in the equivalent fully sorted array.
170 *
171 * @param a Values.
172 * @param fromIndex Index of the first element (inclusive).
173 * @param toIndex Index of the last element (exclusive).
174 * @param k Index.
175 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
176 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
177 * sub-range {@code [fromIndex, toIndex)}
178 */
179 public static void select(double[] a, int fromIndex, int toIndex, int k) {
180 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
181 IndexSupport.checkIndex(fromIndex, toIndex, k);
182 doSelect(a, fromIndex, toIndex, k);
183 }
184
185 /**
186 * Partition the array such that indices {@code k} correspond to their correctly
187 * sorted value in the equivalent fully sorted array.
188 *
189 * @param a Values.
190 * @param fromIndex Index of the first element (inclusive).
191 * @param toIndex Index of the last element (exclusive).
192 * @param k Indices (may be destructively modified).
193 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
194 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
195 * sub-range {@code [fromIndex, toIndex)}
196 */
197 public static void select(double[] a, int fromIndex, int toIndex, int[] k) {
198 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
199 IndexSupport.checkIndices(fromIndex, toIndex, k);
200 doSelect(a, fromIndex, toIndex, k);
201 }
202
203 /**
204 * Partition the array such that index {@code k} corresponds to its correctly
205 * sorted value in the equivalent fully sorted array.
206 *
207 * <p>This method pre/post-processes the data and indices to respect the ordering
208 * imposed by {@link Double#compare(double, double)}.
209 *
210 * @param fromIndex Index of the first element (inclusive).
211 * @param toIndex Index of the last element (exclusive).
212 * @param a Values.
213 * @param k Index.
214 */
215 private static void doSelect(double[] a, int fromIndex, int toIndex, int k) {
216 if (toIndex - fromIndex <= 1) {
217 return;
218 }
219 // Sort NaN / count signed zeros.
220 // Caution: This loop contributes significantly to the runtime.
221 int cn = 0;
222 int end = toIndex;
223 for (int i = toIndex; --i >= fromIndex;) {
224 final double v = a[i];
225 // Count negative zeros using a sign bit check
226 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
227 cn++;
228 // Change to positive zero.
229 // Data must be repaired after selection.
230 a[i] = 0.0;
231 } else if (v != v) {
232 // Move NaN to end
233 a[i] = a[--end];
234 a[end] = v;
235 }
236 }
237
238 // Partition
239 if (end - fromIndex > 1 && k < end) {
240 QuickSelect.select(a, fromIndex, end - 1, k);
241 }
242
243 // Restore signed zeros
244 if (cn != 0) {
245 // Use partition index below zero to fast-forward to zero as much as possible
246 for (int j = a[k] < 0 ? k : -1;;) {
247 if (a[++j] == 0) {
248 a[j] = -0.0;
249 if (--cn == 0) {
250 break;
251 }
252 }
253 }
254 }
255 }
256
257 /**
258 * Partition the array such that indices {@code k} correspond to their correctly
259 * sorted value in the equivalent fully sorted array.
260 *
261 * <p>This method pre/post-processes the data and indices to respect the ordering
262 * imposed by {@link Double#compare(double, double)}.
263 *
264 * @param fromIndex Index of the first element (inclusive).
265 * @param toIndex Index of the last element (exclusive).
266 * @param a Values.
267 * @param k Indices (may be destructively modified).
268 */
269 private static void doSelect(double[] a, int fromIndex, int toIndex, int[] k) {
270 if (k.length == 0 || toIndex - fromIndex <= 1) {
271 return;
272 }
273 // Sort NaN / count signed zeros.
274 // Caution: This loop contributes significantly to the runtime for single indices.
275 int cn = 0;
276 int end = toIndex;
277 for (int i = toIndex; --i >= fromIndex;) {
278 final double v = a[i];
279 // Count negative zeros using a sign bit check
280 if (Double.doubleToRawLongBits(v) == Long.MIN_VALUE) {
281 cn++;
282 // Change to positive zero.
283 // Data must be repaired after selection.
284 a[i] = 0.0;
285 } else if (v != v) {
286 // Move NaN to end
287 a[i] = a[--end];
288 a[end] = v;
289 }
290 }
291
292 // Partition
293 int n = 0;
294 if (end - fromIndex > 1) {
295 n = k.length;
296 // Filter indices invalidated by NaN check
297 if (end < toIndex) {
298 for (int i = n; --i >= 0;) {
299 final int index = k[i];
300 if (index >= end) {
301 // Move to end
302 k[i] = k[--n];
303 k[n] = index;
304 }
305 }
306 }
307 // Return n, the count of used indices in k.
308 // Use this to post-process zeros.
309 n = QuickSelect.select(a, fromIndex, end - 1, k, n);
310 }
311
312 // Restore signed zeros
313 if (cn != 0) {
314 // Use partition indices below zero to fast-forward to zero as much as possible
315 int j = -1;
316 if (n < 0) {
317 // Binary search on -n sorted indices: hi = (-n) - 1
318 int lo = 0;
319 int hi = ~n;
320 while (lo <= hi) {
321 final int mid = (lo + hi) >>> 1;
322 if (a[k[mid]] < 0) {
323 j = mid;
324 lo = mid + 1;
325 } else {
326 hi = mid - 1;
327 }
328 }
329 } else {
330 // Unsorted, process all indices
331 for (int i = n; --i >= 0;) {
332 if (a[k[i]] < 0) {
333 j = k[i];
334 }
335 }
336 }
337 for (;;) {
338 if (a[++j] == 0) {
339 a[j] = -0.0;
340 if (--cn == 0) {
341 break;
342 }
343 }
344 }
345 }
346 }
347
348 /**
349 * Partition the array such that index {@code k} corresponds to its correctly
350 * sorted value in the equivalent fully sorted array.
351 *
352 * @param a Values.
353 * @param k Index.
354 * @throws IndexOutOfBoundsException if index {@code k} is not within the
355 * sub-range {@code [0, a.length)}
356 */
357 public static void select(int[] a, int k) {
358 IndexSupport.checkIndex(0, a.length, k);
359 if (a.length <= 1) {
360 return;
361 }
362 QuickSelect.select(a, 0, a.length - 1, k);
363 }
364
365 /**
366 * Partition the array such that indices {@code k} correspond to their correctly
367 * sorted value in the equivalent fully sorted array.
368 *
369 * @param a Values.
370 * @param k Indices (may be destructively modified).
371 * @throws IndexOutOfBoundsException if any index {@code k} is not within the
372 * sub-range {@code [0, a.length)}
373 */
374 public static void select(int[] a, int[] k) {
375 IndexSupport.checkIndices(0, a.length, k);
376 if (k.length == 0 || a.length <= 1) {
377 return;
378 }
379 QuickSelect.select(a, 0, a.length - 1, k, k.length);
380 }
381
382 /**
383 * Partition the array such that index {@code k} corresponds to its correctly
384 * sorted value in the equivalent fully sorted array.
385 *
386 * @param a Values.
387 * @param fromIndex Index of the first element (inclusive).
388 * @param toIndex Index of the last element (exclusive).
389 * @param k Index.
390 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
391 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
392 * sub-range {@code [fromIndex, toIndex)}
393 */
394 public static void select(int[] a, int fromIndex, int toIndex, int k) {
395 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
396 IndexSupport.checkIndex(fromIndex, toIndex, k);
397 if (toIndex - fromIndex <= 1) {
398 return;
399 }
400 QuickSelect.select(a, fromIndex, toIndex - 1, k);
401 }
402
403 /**
404 * Partition the array such that indices {@code k} correspond to their correctly
405 * sorted value in the equivalent fully sorted array.
406 *
407 * @param a Values.
408 * @param fromIndex Index of the first element (inclusive).
409 * @param toIndex Index of the last element (exclusive).
410 * @param k Indices (may be destructively modified).
411 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
412 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
413 * sub-range {@code [fromIndex, toIndex)}
414 */
415 public static void select(int[] a, int fromIndex, int toIndex, int[] k) {
416 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
417 IndexSupport.checkIndices(fromIndex, toIndex, k);
418 if (k.length == 0 || toIndex - fromIndex <= 1) {
419 return;
420 }
421 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
422 }
423
424 /**
425 * Partition the array such that index {@code k} corresponds to its correctly
426 * sorted value in the equivalent fully sorted array.
427 *
428 * @param a Values.
429 * @param k Index.
430 * @throws IndexOutOfBoundsException if index {@code k} is not within the
431 * sub-range {@code [0, a.length)}
432 * @since 1.3
433 */
434 public static void select(long[] a, int k) {
435 IndexSupport.checkIndex(0, a.length, k);
436 if (a.length <= 1) {
437 return;
438 }
439 QuickSelect.select(a, 0, a.length - 1, k);
440 }
441
442 /**
443 * Partition the array such that indices {@code k} correspond to their correctly
444 * sorted value in the equivalent fully sorted array.
445 *
446 * @param a Values.
447 * @param k Indices (may be destructively modified).
448 * @throws IndexOutOfBoundsException if any index {@code k} is not within the
449 * sub-range {@code [0, a.length)}
450 * @since 1.3
451 */
452 public static void select(long[] a, int[] k) {
453 IndexSupport.checkIndices(0, a.length, k);
454 if (k.length == 0 || a.length <= 1) {
455 return;
456 }
457 QuickSelect.select(a, 0, a.length - 1, k, k.length);
458 }
459
460 /**
461 * Partition the array such that index {@code k} corresponds to its correctly
462 * sorted value in the equivalent fully sorted array.
463 *
464 * @param a Values.
465 * @param fromIndex Index of the first element (inclusive).
466 * @param toIndex Index of the last element (exclusive).
467 * @param k Index.
468 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
469 * bounds of range {@code [0, a.length)}; or if index {@code k} is not within the
470 * sub-range {@code [fromIndex, toIndex)}
471 * @since 1.3
472 */
473 public static void select(long[] a, int fromIndex, int toIndex, int k) {
474 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
475 IndexSupport.checkIndex(fromIndex, toIndex, k);
476 if (toIndex - fromIndex <= 1) {
477 return;
478 }
479 QuickSelect.select(a, fromIndex, toIndex - 1, k);
480 }
481
482 /**
483 * Partition the array such that indices {@code k} correspond to their correctly
484 * sorted value in the equivalent fully sorted array.
485 *
486 * @param a Values.
487 * @param fromIndex Index of the first element (inclusive).
488 * @param toIndex Index of the last element (exclusive).
489 * @param k Indices (may be destructively modified).
490 * @throws IndexOutOfBoundsException if the sub-range {@code [fromIndex, toIndex)} is out of
491 * bounds of range {@code [0, a.length)}; or if any index {@code k} is not within the
492 * sub-range {@code [fromIndex, toIndex)}
493 * @since 1.3
494 */
495 public static void select(long[] a, int fromIndex, int toIndex, int[] k) {
496 IndexSupport.checkFromToIndex(fromIndex, toIndex, a.length);
497 IndexSupport.checkIndices(fromIndex, toIndex, k);
498 if (k.length == 0 || toIndex - fromIndex <= 1) {
499 return;
500 }
501 QuickSelect.select(a, fromIndex, toIndex - 1, k, k.length);
502 }
503 }