# Degenerate boundaries for multiple-alternative decisions | Panda Anku

### Problem setup and aim

To investigate the form of the optimal decision boundary for multiple choices, we follow the usual convention that choice evidence is modeled by overlapping normal distributions23. Each choice (hypothesis) Hi is represented by a normal distribution with vector mean μi and standard deviation σi. These parameters μi, σi are defined by the inter-choice discriminability, which is the amount of overlap between choice distributions: the less overlap between distributions i and j, the more discriminable the choices and easier the task24. We assume equal discriminability between all choices, and so all choice distributions are equivariant with equidistant means, which is achieved by using vector-valued evidence (see Methods). This means that for each decision episode, the “true” hypothesis is equally indiscriminable from all other hypotheses, giving a consistent n-alternative forced choice (nAFC) paradigm regardless of which hypothesis is chosen.

The integration-to-threshold model samples evidence from the ‘true’ hypothesis until a decision boundary is reached. Each choice distribution represents possible evidence for that hypothesis, originating from the environment, memory, or noisy sensory processes18. At each time step, a sample is taken and inference is performed on the evidence accumulated thus far, generating a decision trajectory. The decision time T is when this trajectory crosses a boundary for a particular choice. If the boundary crossed represents the “true” hypothesis, then zero error e = 0 is generated, whereas crossing any other boundary generates a unit error e = 1. Usually, integration-to-threshold models rely on scalar evidence with a scalar decision boundary. In our case, the evidence will be a vector with boundaries that are hyper-surfaces in a vector space, which is detailed in the next section.

At the end of a decision episode, when a choice is made, the decision time and error are combined into a single reward. Here, we formulate reward as a linear combination of error and decision time weighted by their associated costs Wi and c:

$$r=left{begin{array}{ll}-{W}_{i}-cT,&{{{{{{{rm{incorrect}}}}}}}},{{{{{{{rm{decision}}}}}}}}\ -cT,hfill &{{{{{{{rm{correct}}}}}}}},{{{{{{{rm{decision}}}}}}}}.end{array}right.$$

(1)

This is a standard reward function used in a wide range of past work, for example, ref. 7,25. Unequal error costs Wi ≠ Wj induce choice-dependent reward, where hypothesis-dependent error costs are relative to the “true” hypothesis and to each other. For tractability, we will assume all error costs are equal, with the expectation that similar results hold in the unequal cost case but that the analysis will be more complicated. We also consider a constant (time-independent) cost c per time step, assuming stationarity of evidence distributions and evidence accumulation in a free-response task. A challenging aspect of this framework is that reward is highly stochastic due to the random nature of evidence sampling. How then do we define optimality?

In this paper, we come from the view that humans and animals maximize expected reward19,26,27,28. Then the optimum decision boundary maximizes the average reward for a given ratio of costs c/W. Monte Carlo simulations of decision trajectories of independent trials, using the formalism outlined above and the evidence inference method derived in the next section, yield reward values for a set of candidate decision boundaries. In general, we find a set of high-dimensional nonlinear, complex boundaries. We will show that these boundaries are consistent with a range of behavioral and phenomenological results along with testable neurophysiological predictions.

### Multi-alternative decision-making as a particle diffusing in n-dimensions

In this section, we show that n-alternative decision-making can be viewed as a diffusion process in an (n − 1) dimensional subspace of the belief space. This is a perspective that has previously been established (for example, see refs. 7, 14), but we cover this material here to help the reader build intuition and to detail the implications for multi-alternative decisions.

For 2AFC tasks, integration-to-threshold models such as the sequential probability ratio test (SPRT, see Methods), represent the decision trajectory as a particle diffusing in 1D. If we define this dimension on the y-axis with the origin corresponding to the point of equal belief for each hypothesis, then positive y-values represent greater belief in choice H0 and negative values greater belief in choice H1. If time is represented along the x-axis, the decision trajectory takes the form of a random walk over a range of y-values, with decision boundaries y ≥ ± θ0,1 for the two decisions 0, 1. This bounded random walk model can be extended to nAFC tasks, the walk taking place in (n − 1)-dimensions. However, this extension is not straightforward. Firstly, belief in hypotheses H0,1 are defined over the positive and negative real numbers of a single dimension, which raises the question of how belief in another hypothesis H2 should be represented. Secondly, the decision boundaries in 1D are well defined as a pair of single bounds (θ0 < y < θ1), but as the belief space extends to n-choices, how should the decision boundaries be represented?

To examine these questions, we take a Bayesian sequential inference perspective in which 1D decision variables in models like the SPRT are deconstructed into two decision variables that represent the degree of belief in two hypotheses H0 and H1. By using the sequential Bayesian inference beliefs directly, the positive/negative range for the 1D decision variable is split into two independent axes that represent normalized belief over each hypothesis as a separate decision trajectory, given by the posterior probability Pi(t) = P(Hix(1:t)) where x(1:t) is the accumulated evidence at time t (see Fig. 1).

The decision variable transformation between the SPRT and sequential Bayesian inference is straightforward (Fig. 1c, d). A decision variable (DV) represents the accrual of all sources of priors and evidence into a quantity that is interpreted by the decision rule to produce a choice4. The DV of the SPRT is the log posterior probability ratio29 and the DV of sequential Bayesian inference is simply the posterior probability. Because sequential Bayesian inference is constrained by P0(t) + P1(t) = 1, it has the same number of unconstrained degrees of freedom as SPRT. Moreover, the boundary values are equivalent under the DV transformation from SPRT boundaries θ0,1 to boundaries on the posteriors Θ0,1; specifically, the SPRT thresholds are given by the log-odds of the corresponding posterior thresholds in sequential Bayesian inference4,

$$theta=log (Theta /1-Theta ).$$

(2)

Now, the key point is that sequential Bayesian inference applies to an arbitrary number of choices and so holds for general nAFC decision-making14. Figure 1c, d illustrate this for 3- and 4-choice tasks, respectively, with the dashed lines representing flat decision boundaries Pi(t) > Θi. Individual probability trajectories for each choice correspond to the coordinates of the overall decision trajectories (Fig. 1a, b), interpreted as a particle diffusing in nD. Sequential Bayesian inference forms an orthogonal coordinate system for each probability trajectory (Fig. 1c, d) as components of the n-dimensional decision trajectory (Fig. 1a, b).

There are geometric implications of using sequential Bayesian inference as coordinates for n-dimensional decision trajectories (Fig. 1a, b). Although the decision trajectories have n probability-coordinates, Pn, they are constrained such that ∑iPi(t) = 1; therefore, the decision trajectories populate (n − 1)D simplices. For example, 2AFC decision dynamics are represented as a particle constrained to have P0(t) + P1(t) = 1, which is the 1D line P1 = 1 − P0 on a 2D (P0, P1) plot of the beliefs. It follows that 3AFC dynamics take place on a 2D plane (Fig. 1a) and 4AFC dynamics in a 3D tetrahedron (Fig. 1b) and so forth. Note that if any hypothesis has zero probability Pi(t) = 0, then the space in which the decision trajectory evolves collapses to the remaining non-zero directions. For example, each face of the 4-choice tetrahedron in Fig. 1b is a combination of three choices with non-zero probabilities and each edge a combination of two such choices.

As a result, decision boundaries are (n − 2)-dimensional objects in n-dimensional probability space. So for n > 2 choices, boundaries can have spatial dependence with respect to the decision space visualized in Fig. 1a, b. For 2AFC tasks, decision boundaries are points on a 2D line, which are simply the transformed boundaries (equation 2) of the standard two-choice integration-to-threshold model. Likewise, for 3AFC, the decision boundaries are lines on a plane (Fig. 1a, dashed lines) and for 4AFC are planes within a tetrahedron (Fig. 1a, dark gray planes). The example boundaries shown are flat with a constant decision threshold in each dimension. An interesting consequence is that high-dimensional boundaries can have a nonlinear structure as a function of the n-dimensional beliefs P. Then, the linear 3AFC boundaries (Fig. 1a, dashed lines) generalize to curves and the planar 4AFC boundaries (Fig. 1b, dark gray planes) generalize to curved surfaces.

Curved decision boundaries have been shown to perform optimally on 3AFC tasks for free-response, mixed-difficulty trials7; however, it is not known how important the precise shape of that boundary is for maximizing reward. Here we ask whether there are other complex boundary shapes that improve performance over the flat boundary case and whether the greater freedom to choose nonlinear boundaries has other consequences for decision-making.

### Multi-dimensional decision boundaries can be complex

To investigate the importance of boundary shape for reward maximization, we define a subset of possible boundaries using some specific spatial parameterizations that provide diverse sets of nonlinear boundaries. These parameterizations are constrained such that: (I) each boundary θi intersects with each edge leading away from the point Pi = 1, and likewise intersects with each (hyper)plane leading away from the said point (e.g., each colored boundary intersects with two edges in Fig. 2); and (II) assuming symmetric error costs Wi = Wj for simplicity in equation (1), the boundaries remain symmetric under permutations Pi ↔ Pj (e.g., all boundaries have the reflectional symmetries of the outer equilateral triangle in Fig. 2).

These constraints can be used to derive a general boundary parameterization comprising a shape function and tuning parameters (Fig. 2). A general boundary parameterization F(P(t); θ, α, …) takes the probability vector P(t) as an input, along with an edge-intersection parameter θ and shape parameters (α, …) to give a decision rule:

$${P}_{i} (3) The resulting complex decision boundary has an amplitude parameter α and some additional shape parameters. To make our investigation tractable, we limit our parameterization to one additional parameter, β (e.g., a frequency in the oscillating case). For simplicity, we select four distinct forms of F that we call flat(θ), curve(θ, α), power(θ, α, β), and oscil(θ, α, β), examples of which are shown in Fig. 2 and all of which contain the flat boundary as a particular instance (see Methods for the full forms and a mathematical derivation). Within these parametric subsets, the optimal decision boundaries are determined by optimal values of θ, α, and β. Some example boundary parameterizations illustrate the range of possible boundary features and how they extend to multiple-choice decision tasks (Figs. 2 and 3). Each parameterization is a scaling of a flat boundary (Fig. 2, left column) denoted as the flat(θ) function, such that: (I) The function curve(θ, α) is the simplest parameterization of interest, with α the amplitude of the curve (Fig. 2, second column). (II) The function power(θ, α, β) has an additional parameter β that modulates a double curve or forms a central peak (Fig. 2, third column). (III) The oscillatory function oscil(θ, α, β) is a cosine with amplitude α and frequency β (Fig. 2, fourth column). We have chosen these parameterizations so that if β = 0, we recover the curve parameterization, and if α = 0, we recover the flat parameterization. These parameterizations apply to any number of choices n ≥ 2, with examples of the curve and oscil functions for 4AFCs shown in Fig. 3a, b respectively. Note how these decision boundaries intersect with each 3AFC plane: each face in Fig. 3a matches a panel in Fig. 2. Overall, we have constructed a set of permutation-invariant, nonlinear decision boundaries that we will use as candidate functions to explore optimal decision rules. This raises the question of which parameter values give optimal boundaries within these parametrized subsets of boundaries. ### Complex decision boundaries are consistent with the speed-accuracy curve It is well established that humans and animals generate speed-accuracy trade-off (SAT) curves during decision-making experiments6,9, showing the mean error against mean decision time, where each point on the curve can be accorded a cost ratio c/W of time to errors (equation (1)). For 2AFC tasks, the trend is such that as the value of c/W increases, speed is favored over the accuracy, and so the mean decision time decreases with a compensatory increase in mean decision error. This trade-off is instantiated by the decision rule (learned for each c/W value) with the SAT curve implicitly parameterized by decision boundary parameters. For 2AFCs, this is a single parameter: the flat boundary threshold θ30,31. For nAFCs with n > 2, these are complex boundary functions (equation (3)) with sets of parameters (α, β, …). If the SAT curve is truly a curve, rather than a region, then multiple parameter combinations would give the same SAT, since a curve requires just one implicit parameter. To examine the SAT curves for each parameterization, we optimized the boundary parameters for a range of cost ratios c/W and then plotted the resulting accuracies and decision times (Fig. 4). Optimal parameters θ, α, β were found by stochastic optimization over the reward landscapes, using Monte Carlo simulation over a grid search of parameters to generate a visualization of the reward landscapes (Fig. 5a–c). As rewards are highly stochastic, smoothed landscapes were averaged over 100,000 decision trajectories. Optimal parameters were extracted by taking all points with a mean reward for which (r ; > ;{r}_{max }-delta sigma), where ({r}_{max }) is the maximum mean reward of the noisy landscape, σ is the spread of rewards, and δ = 0.02 is a small parameter (see Methods for details). Since the variation in expected reward is so small as to be negligible for a real decision-maker with limited capacity for sampling rewards from the environment, we define these boundaries as constituting a set of “good enough” boundaries that are in practice as effective as a true optimum within each parameterization. The boundary functions curve, power, and oscil produce smooth, well-defined SAT curves (Fig. 4) that resemble the relationship between speed and accuracy for optimal rewards found experimentally2,31. Despite the wide range of boundary shapes they describe, all three parameterizations produce nearly identical SAT curves, as confirmed by overlaying the average over all three cases (Fig. 4, black dashed curve, all panels). Flat boundaries (α = 0) are contained within these parameterizations, and so the SAT curves for all three boundary functions closely resemble the flat-boundary case. One difference between the three cases is their relative spreads: the curve(θ, α) parameterization yields the tightest SAT curve, followed by the oscil(θ, α, β) SAT curve, and finally, the power(θ, α, β) SAT curve is the thickest. One might therefore consider that the curve parameterization qualifies as the ‘best’ SAT curve, which is consistent with previous work since it describes the shape of the optimal boundary found for the 3AFC case in refs. 7,14. However, all parameterizations closely follow a single mean SAT curve (black dashed lines), so a wide range of boundary characteristics give near-optimal decisions. Moreover, each value of c/W has multiple points spread along the same curve, so the SAT can be satisfied by multiple optimized boundaries even within the same parameterization. ### A degenerate set of decision boundaries yield close-to-optimal expected reward So far, we have seen that optimized nonlinear decision boundaries generate well-defined SAT curves that remain similar in three parameterizations curve, oscil, and power. Next, we analyze the optimized boundaries by direct inspection of the reward landscapes and their position on the SAT curve. Inspection of the 3AFC reward landscapes for the curve parameterization reveals that the region with mean rewards within δ of the maximum ({r}_{max }) extends across the parameter space (Fig. 5, black lines). As c/W increases and the maximum reward ({r}_{max }) decreases (panels a–c), this acceptance region sweeps towards θ = 1/3 and the flat boundary α > 0 and becomes more dependent on α in addition to θ. These acceptance regions correspond to sets of optimized parameter combinations and so specify families of decision boundaries that all maximize reward within a small variance. For closer scrutiny of the optimal region, five sections of the reward landscapes for 3AFC curve(θ, α) parameterization are shown in Fig. 6. These sections correspond to values α = {−20, 10, 0, 10, 20} (including the flat boundary α = 0) to provide a detailed look at the reward landscape peaks with 100-fold more samples of θ than in Fig. 5. Evidently, the peak changes with cost ratio c/W and θ (Fig. 6, right column). This analysis leads to the observations: (I) As c/W increases, the reward landscape maximum covers a broader range of θ values (section peaks separate) and appears to acquire slope (peaks diverge in height, with a higher-to-lower pattern). (II) The spread of the average rewards at the peak of each section overlaps (red dashed lines), decreasing as c/W increases but not diminishing to zero. (III) Extracting near-to-optimal parameters by taking all points within a small δ = 0.02 range of the peak average reward yields a set of near-optimal decision boundaries over a broad range of θ and α values (black points in Fig. 6). These three observations all support the effective degeneracy of optimized decision boundaries within the parameterizations. Observation (I) shows that for small c/W, the underlying structure of the reward landscape appears to degenerate with sections almost entirely overlapping (Fig. 6, top right). As c/W increases, a shallow structural maximum becomes apparent (Fig. 6, bottom right). Observation (II) shows significant overlap in the close-to-optimal region even across the apparent structural maximum (Fig. 6, bottom right). Observation (III) shows directly that there is an effective degeneracy. One could question whether different sections through the reward landscape would change these observations, as Fig. 6 depends on the range and discretization of α. Our range of α covers the entire range of boundary shapes shown in Fig. 2, including flat boundaries, and because of the gradual variation across sections, we would not expect further structure from a finer discretization. We also expect that using more Monte Carlo samples for each cross-section would not change the results, as the means and spreads shown in the sections in Fig. 6 appear to be good estimates of the distributions of average rewards (e.g., by their smooth variation with θ and unitary maxima). The close-to-optimal set of boundaries produces a range of speed-accuracy trade-offs (Fig. 5d–f). In 2AFC decision-making, each point on the SAT curve is a unique optimal boundary specifying a unique trade-off for a given value of c/W. This raises the question: for nAFCs with n > 2, what are the range of points on the SAT curve given by the set of close-to-optimal boundaries? Fig. 5d shows the mean decision errors against mean decision times for all close-to-optimal boundaries found in the 3AFC landscapes from Fig. 5a–c. In each case, an effectively-optimal reward is achieved by a broad range of SATs (Fig. 5d) rather than a tight group around a single SAT. How can a small range of reward values produce a broad range of speed-accuracy trade-offs? The breadth of speed-accuracy trade-off values produced by complex decision boundaries is explained by a range of near-optimal threshold parameter values. Then, given cost values W and c, the rewards$${mathbb{E}}({r}_{max })=-W{mathbb{E}}(e)-c{mathbb{E}}[T],; e={0,1},; ({{{{{{{rm{correct}}}}}}}}/{{{{{{{rm{incorrect}}}}}}}},{{{{{{{rm{decision}}}}}}}})$$(4) have two degrees of freedom for each value of ({mathbb{E}}({r}_{max })) in trading off the expected error ({mathbb{E}}(e)) and expected decision time ({mathbb{E}}[T]). Thus, the same expected reward may be attained by boundaries with different combinations of ({mathbb{E}}(e)) and ({mathbb{E}}[T]). Hence, the set of close-to-optimal decision boundaries yields the range of (({mathbb{E}}(e),,{mathbb{E}}[T])) solutions shown in Fig. 5d–f. ### Effectively-optimal decisions with different boundary shapes What are the characteristics of the nonlinear decision boundaries in the near-optimal set? If the decision boundaries were qualitatively similar in shape, then learning the particular boundary shape would be important for maximizing reward. Conversely, if the boundary shapes are qualitatively different, then learning the precise boundary shape would not be critical, and the emphasis on optimality would shift to the inference and normalization processes for multiple-choice decision-making. Figure 5e shows that for each cost ratio c/W, the edge-intersection parameter θ seems to be the main determinant of optimality, as is visible in the homogenous values of θ within each region (a–c). In contrast, the shape parameter α takes heterogenous values within each region (a–c) in Fig. 5f. For every cost ratio c/W, the entire explored range of α is represented in the optimal set, whereas there is a narrow range of θ that varies with c/W. The flat-boundary case (α = 0) is optimal for each of the degenerate sets, with the optimal θ then a single value that lies within the broadened range when α is non-zero. Thus, for all c/W, there are many SATs near each parameter value, and in turn, each SAT instance is close to many different parameter values (Fig. 5e, f). Therefore, it appears that close-to-optimal multi-alternative boundaries are possible with significant modulation of the flat-boundary case. The close-to-optimal set contains a broad range of parameters, giving a diverse set of boundary shapes (examples in Fig. 7). This supports the notion that learning the precise boundary shape is less important for making effective optimal decisions. ### Mean error and decision time vary along the optimal decision boundaries Every point on an extended boundary for nAFC decision-making with n > 2 has an associated error and decision time distribution (see Fig. 7 for a color map of the mean decision time for flat boundaries). In contrast, the 2AFC decision thresholds are points on a line in the space of 2D belief vectors P from (0, 1) to (1, 0). Each choice on an extended boundary is a single point with an error and decision time distribution. Spatially-dependent error and decision time distributions that vary along the decision boundary are a consequence of having a multi-dimensional belief vector that can vary over a decision boundary embedded as a curve or surface in the higher-dimensional belief space. Hence, the close-to-optimal set of boundaries have different ranges of mean decision times and mean errors but practically indistinguishable expected rewards. If the under-determinism in the reward structure were eliminated (e.g., by also minimizing mean error or mean decision time), then the set of reward-maximizing decision boundaries would contract and SATs narrow. Interestingly, decision times vary even along flat boundaries (Fig. 7a, yellow lines; decision times shown by background color), and so even the simplest case of crossing a single threshold is complicated for multiple alternatives. ### Implicit dynamics of optimal decision boundaries support both static and collapsing thresholds The point where a decision trajectory crosses a high-dimensional decision boundary is a belief vector that has a corresponding mean error and mean decision time. Because all of these quantities can vary along a nonlinear boundary, there can appear to be non-trivial dynamics in the decision ‘threshold’ if it is instead interpreted as a unitary value rather than as a boundary function. Here we refer to this property as implicit dynamics because it originates in the boundary shape rather than from an explicit time-dependence of the threshold. In Fig. 7, the boundary overlays a gradient coloring representing decision time, making explicit the non-trivial relation between decision time and belief of a decision. In this sense, we uncover temporal ‘dynamics’ implicit in the static, complex, and nonlinear decision boundaries for multiple choices. There has been much debate over whether temporally-dynamic decision thresholds give a better account of 2AFC experimental data than the static thresholds of SPRT15,16,18. The assumption of fixed (constant-valued) thresholds has been called into question, with collapsing thresholds gaining popularity, which are sometimes interpreted as urgency signals9,19,30,32. From an optimality perspective, collapsing thresholds are more appropriate for repeated free-response trials of mixed difficulty and for those with deadlines19,21, whereas static thresholds are appropriate for single free-response trials without deadlines and repeated free-response trials of known difficulty; however, static thresholds are not adequate for single free-response trials of mixed, a priori unknown difficulty25. Models with collapsing thresholds have been shown to reduce the skew of error and decision time distributions in some experimental tasks16, and urgency signals proposed to account for increased firing rates in the LIP brain region of macaques during trials in which accumulated evidence (encoded as neuronal firing rates) is unchanging10,11,30. How multiple-choice decision boundaries relate to this debate is thus of interest. To investigate the relationship between implicit decision threshold dynamics discussed above and spatial nonlinear decision boundaries, we transform the decision variables to a form that gives a temporal structure in the threshold as a consequence of the extended static boundaries. The boundary beliefs are sorted by decision time, averaging over boundary values with identical decision times. These then appear as time-dependent decision boundaries applied to evidence (Fig. 8), which for display purposes, we represent using the log-odds (equation 2). These dynamic decision thresholds have a range of temporal structures for each cost ratio c/W, separating naturally into three categories: increasing (Fig. 8, top row), collapsing (middle row), and static thresholds (bottom row). These categories appear to correlate with the shape of the decision boundary: increasing thresholds with convex boundaries (e.g., Fig. 7a, dark blue curve), decreasing thresholds with concave boundaries (e.g., Fig. 7a, red curve), and static thresholds with flat boundaries (e.g., Fig. 7a, yellow line). This correlation seems to originate in an increase in decision time as the belief moves away from equality between choices (Fig. 7a–c, shading). All things being equal, decisions that terminate with higher beliefs tend to be more accurate, whereas decisions terminating with a low belief of the choice tend to be less accurate. Therefore, experimental observation of implicitly dynamic decision boundaries (including increasing thresholds) could be due to time-dependent accuracy plots of data from individual subjects33. We emphasize that while these implicit threshold dynamics look like temporal dependence, they are, in fact due to the spatial structure of the decision boundary. Flat decision boundaries can therefore be interpreted as a special case where the boundary on the belief in one choice to cross its threshold is independent of the beliefs of the other choices; however, even then, the mean decision time and mean error have a spatial structure along these boundaries (Fig. 7). ### Comparison to current models, interpretations, and predictions There are few normative approaches to modeling multiple-choice decision-making, with the recent study of ref. 7 the state-of-the-art in using an evidence accumulation vector accumulated in a race model with a boundary in n-dimensional space. Using dynamic programming, they find optimal decision boundaries for free-response, mixed-difficulty trials are nonlinear and collapse over time. Clearly, their nDRM is closely linked to the model presented here, but there are key differences: the models have a different evidence structure, trial structure, and optimization process, which leads to diverging perspectives on the nature of multiple-choice decision boundaries. In the following, we describe how these perspectives can be reconciled, and in doing so gain a broader understanding of normalization and inference in multiple-choice decision-making. In comparison with the nDRM, one difference between our model and that in ref. 7 is in how they use Bayesian inference to accumulate evidence. Although the nDRM is derived from Bayes’ rule, evidence is accumulated linearly and inference values (the posterior) are used indirectly to calculate the expected reward7. Conversely, our model, which focuses more on the details of nonlinear decision boundaries, uses the posterior from Bayes’ rule as the accumulated evidence. In this respect, our model has less biophysical realism that the nDRM because there is little supporting evidence for the brain representing posterior beliefs directly as probabilities. Instead, studies point towards the brain employing indirect representations from which beliefs can be inferred33,34. However, we describe below how either view of evidence representation is compatible with two of the main results of the present paper. Firstly, evidence accumulation takes place on an n − 1-dimensional simplex in our model, as in Fig. 1a, b (n = 2 a curve; n = 3, a plane). Similarly, reward maximization in the nDRM results in evidence accumulation perpendicular to a diagonal equidistant from all evidence-component axes, and so the space collapses to n − 1 dimensions. This subspace appears to be a scaling of the posterior simplices shown in Fig. 1a, b, which is simply a normalization of the evidence accumulation (see Methods). In consequence, both models agree that normalization of the evidence is a key component of multiple-choice decision-making, which is imposed by using probability values directly here and emergent in the nDRM. Further, the decision boundaries in both models exist in that same subspace, and so agree as to the expected dimensionality of neural population activity during evidence accumulation (see later section on Predictions). Secondly, our model results in a set of close-to-optimal decision boundaries that contains drastically different boundary shapes, which we now argue is compatible with the nDRM. Tajima et al. use a network approximation to the optimal decision rule to separate the boundary components into a race model with nonlinear and dynamic (collapsing) boundaries, both with normalization, and evaluate their performance under reward maximization7. Their model performance is best in the presence of internal variability (adding noise), where omitting boundary dynamics performs similarly or slightly better than the combination of nonlinearity, dynamics, and normalization together. From our perspective, these distinctly different types of boundaries could be interpreted as members of a set of close-to-optimal boundaries since both types result in effectively maximal rewards. The effect of evidence normalization is also important. Both the model presented here and the nDRM rely on evidence normalization as key to model performance. In the model here, the normalization of evidence follows from using posteriors, whereas in the nDRM it follows from a projection of the accumulated evidence onto the n − 1-dimensional subspace described above. This suggests a major influence on optimality for multiple-choice tasks, yet an assessment of the contribution of normalization alone on performance is absent in ref. 7: in the nDRM, normalization is not separated from the nonlinearity of the decision boundaries. For this reason, and because7 uses mixed difficulties which demand collapsing boundaries, comparing the performance to flat boundaries for the nDRM is not comparable to the flat boundaries examined here. Normalization appears integral to optimal multiple-choice decision-making, and the magnitude of its influence may offer an explanation for degenerate optima: it appears that boundary shape has a lesser influence on optimality when the evidence is normalized, which would benefit learning and generalization as a general neural mechanism for context-dependent decision-making35. Evidence normalization is also a mechanism that satisfies a number of physiological constraints. Represented by the range of neural activity, normalization satisfies: (I) Biophysical constraints of the range of activity of neural populations – the firing rate of biological neurons cannot be negative and cannot exceed a certain level due to their refractory period13; (II) neural recordings of decision-making tasks show that a decision is triggered when neural activity reaches a stereotyped level of activity9, and (III) as the number of options increases, so too does the processing and representation of multivariate evidence accumulation and the relative belief over these options. This, together with the wide-ranging influence of normalization on optimality discussed previously, leads us to consider normalization as an integral part of the evidence accumulation process. We now show that normalization, here in the form of a posterior representation of the evidence, can explain some ‘irrational’ behaviors: (a) the decrease in offset activities in multi-alternative tasks; (b) violation of the independence of irrelevant alternatives, and (c) violation of the regularity principle9,36,37,38,39. These behaviors are outcomes of three properties of normalization when adding (for example) a third option P2, where the currently-held beliefs are P0 and P1 such that P0 + P1 = 1; then the new normalized probabilities are$${tilde{P}}_{i}=frac{1}{1+{P}_{2}}{P}_{i}.$$(5) This has the effect of increasing the minimum distance dT = T − Pi/(1 + P2) from each choice belief to a boundary T, decreasing the distance d = Pi/(1 + P2) − 1/3 from the flat prior, and reducing the difference in belief values supporting each alternative$${Delta }_{i,j}=frac{1}{1+{P}_{2}}|{P}_{i}-{P}_{j}|,quad quad ine j.

(6)

All of these quantities decrease as the belief value of option P3 increases, which has the following consequences.

First, we consider (a) a decrease in offset activities in multi-alternative tasks. Multiple studies show that the initial average neural activity (“offset”) encoding evidence accumulation decreases as the number of options increases9,36. Support for this offset behavior is given in ref. 7 for the network model of the nDRM by introducing it directly into the reward maximization as the number of options is increased. However, assuming that evidence accumulation uses the posterior directly, as in our model, then for each unit increase in the number of choices, the average belief per choice decreases by 1/n(n + 1); i.e., the decrease in offset activity is a direct consequence of evidence normalization.

Next, we consider (b) the violation of IIA. The independence of irrelevant alternatives (IIA) recurs in many traditional rational theories of choice40,41. It asserts that the presence of an ‘irrelevant’ option should not affect the choice between “relevant” options (e.g., adding a low-value choice to existing higher-value choices)42,43,44. Violation of this principle has been shown across behavioral studies in both animals and humans37. This behavior is replicated in7 using a network approximation of the nDRM with added noise during evidence accumulation and is attributed to their use of divisive normalization. In our model, the violation of IIA is explained by the representation of evidence accumulation as the belief vector (equations (5, 6)): as the belief of the third option (P2) increases, the belief values of other options are reduced, which necessitates further evidence accumulation before making a decision than would otherwise have been required. Further, the difference in belief values supporting both high-valued options is decreased (equation (6)), so more evidence is needed to choose between these options also. Requiring additional evidence accumulation, and so increased difficulty in choosing between the two high-valued options, is exhibited by longer decision times and/or higher error rates, as in behavioral studies.

Last, we consider (c) the violation of the regularity principle. The inverse of IIA violation, the regularity principle, says that adding extra options cannot increase the probability of selecting an existing option, and has been found to be violated in behavioral studies38,39. This is simulated in7 using the same network approximation of the nDRM as for the violation of IIA. We also find that violation of regularity is congruent with IIA violation: adding a third option reduces the belief value of the original options (equation (5)) while also reducing the difference in belief values (equation (6)).

Our next consideration is related to network models of decision-making. In the nDRM, evidence accumulation is implemented in a straightforward manner, but the optimal boundaries are complex and nonlinear7. The optimal decision policy is approximated by a recurrent neural circuit that implements a nonlinear transformation of the accumulators, which simplifies the decision rule to a simple winner-takes-all rule when an accumulator reaches a single threshold. The decision rule is then simple, local, single-valued, and applies to each population independently. An interesting question is whether a similar approximation and neural implementation can be implemented with the nonlinear boundaries presented here. In principle, our model could be approximated by applying normalization (as in ref. 7) and a nonlinear transformation of the decision variable to remove the nonlinear component of the boundary. However, it is unclear whether a recurrent network (as in ref. 7) exists to transform the decision variables for the more complex power and oscil boundaries. If it is possible to represent the nonlinearities within a larger recurrent network, one would obtain a local, single-valued decision rule applied independently to each accumulator.

### Reproduction of other experimental findings

Hick’s law in choice RTs: Hick’s law is a benchmark result relating decision time to the number of choice alternatives. The relationship is classically log-linear in the form (bar{{{{{{{{rm{RT}}}}}}}}}=a+blog (n)). This relationship is found for both perceptual and value-based implementations of the nDRM for a set cost ratio45,46,47. Our model also replicates this relationship for a number of cost ratios (Fig. 9a, colored lines). Interestingly, both the slope and intercept vary with the cost ratio c/W.

SAT offset and slope: It has been reported that increasing the number of choices n results in a steeper slope and larger values for decision time versus coherency, along with a steeper slope and larger values of mean error9. For our model, we find that the SAT curves move away from the origin with increasing choices (Fig. 9b). Notice that the slope of the curves also decreases with the number of choices, which is not inconsistent with ref. 9.

Dynamic thresholds: Urgency signals have been observed in neural recordings from area LIP during multiple-choice tasks8,9 and are often interpreted as implementing collapsing decision thresholds, although this is still under debate15,16. For multiple-choice boundaries, temporal dynamics of this type have been found for the optimized nDRM7. Our model shows that the appearance of urgency signals could, in part, be explained by the change of decision times along nonlinear decision boundaries (Figs. 7 and 8), giving rise to what looks like temporally-dynamic thresholds. However, boundaries that are nonlinear in evidence (but linear in time) do not apply to 2AFC tasks. Therefore, they cannot act as a catch-all explanation for the urgency signal since this signal has also been observed during 2AFC tasks. Additionally, our results also reproduce dynamic boundaries with increasing or mixed gradients, as found in17, although not as a result of mixed-difficulty trials.

### Predictions

The nDRM makes several predictions pertaining to both behavior and neural implementation7. These also apply to the model presented here but stem directly from the mechanism of posterior probabilities as evidence accumulation with the consequent boundary degeneracy. Here we make a variation of these predictions and so a means of distinguishing empirically between the nDRM and a distinct class of models, encompassing the one here, where the integrators are normalized to represent probabilities48,49.

Firstly, the depression of neural activity prior to evidence accumulation (the offset) increases with the number of alternatives, as found in neural recordings of area LIP9. We find that this can be attributed directly to the decreasing value of the priors. We, therefore, predict that mean offset is independent of modulations of the task, such as changes in reward rate or the learnt SAT. This is contrary to the predictions made by7, which attribute the offset to a mechanism for reward maximization of the network approximation, and so predict that the offset has a dependency on reward rate, encompassing the reward values and inter-trial interval.

Secondly, the neural population activity should be near an (n − 1)-dimensional subspace during evidence accumulation. We found that evidence accumulation takes place on an (n − 1)-dimensional simplex with nonlinear decision boundaries. In this, we agree with7 that the neural population activity should be constrained to an (n − 1)-dimensional subspace during evidence accumulation, and that this could be tested with standard dimensionality-reduction techniques using multi-electrode recordings. However, there may be further subtlety in our case due to the effective degeneracy of the decision boundaries. The section “Implicit dynamics of optimal decision boundaries support both static and collapsing thresholds” showed that the decision variable can be transformed into a form that gives apparent temporal structure in the threshold (Fig. 8), dependent on the particular nonlinear boundary within the degenerate set. Our expectation is that this subtlety in the threshold dynamics may also manifest in that the accumulation manifold may appear to vary from subject to subject or trial to trial while maintaining near-optimal performance (given that any point in the accumulation manifold could also be on some decision boundary).