flocker Single-season, multi-species false-positive model 'Semantic Error' in Flocker

Hi Jacob,

I'm doing a single-season, multi-species false-positive occupancy model using Flocker with data collected from ARUs, following the publication "A scalable and transferable approach to combining emerging conservation technologies to identify biodiversity change after large disturbances" (Wood et al., 2023) and associated code in the appendix. I have data for a single year, with 5 repeated samples with a FP N x J matrix matching the observation data.

I have all of the data packaged up via make_flocker_data() for an single-season, multispecies FP model. Below is a snippet of the packaged data. flocker was installed from the @alt-fp development branch.

However, running a flocker model breaks with the following error: Semantic error in 'C:/XXXX/AppData/Local/Temp/Rtmp0kdk36/model-550c27f2c30.stan', line 113, column 10 to column 78: 111: lp += log_sum_exp( 112: bernoulli_logit_lpmf(1 | occ[i]) + 113: emission_1_fp(to_row_vector(mu[indices]), vr[indices], vor[indices]), ^ 114: bernoulli_logit_lpmf(0 | occ[i]) + 115: emission_0_fp(vr[indices]) Ill-typed arguments supplied to function 'emission_1_fp': (row_vector, array[] real, array[] real) Available signatures: (row_vector, row_vector, row_vector) => real The second argument must be row_vector but got array[] real.

I'm a bit confused given the data is pre-packaged in the make_flocker_data() function, so I'm not sure why an array is being passed in place of a vector (assuming that is what's happening here).

Any ideas? Below is the flocker object.

List of 6 $ data :'data.frame': 638730 obs. of 29 variables: ..$ ff_y : num [1:638730] 0 0.183 0 0 0.586 0.378 0.649 0.298 0 0.251 ... ..$ ff_nblocks : int [1:638730] 94 94 94 94 94 94 94 94 94 94 ... ..$ ff_block : int [1:638730] 1 1 1 1 1 1 1 1 1 1 ... ..$ ff_frac_true : num [1:638730] 0.968 0.968 0.968 0.968 0.968 ... ..$ ff_P : num [1:638730] -99 0.229 -99 -99 0.398 ... ..$ ff_QQ : num [1:638730] -99 0.568 -99 -99 0.375 ... ..$ ID : chr [1:638730] "C2065_U1" "C2134_U2" "C2203_U2" "C4178_U2" ... ..$ Long : num [1:638730] -0.743 -0.744 -0.753 0.123 0.164 ... ..$ Lat : num [1:638730] 0.67 0.647 0.611 -0.243 -0.254 ... ..$ ele : num [1:638730] -0.493 -0.389 -1.126 0.224 0.157 ... ..$ ppt : num [1:638730] 1.037 1.273 1.037 -0.163 -0.196 ... ..$ tmx : num [1:638730] 0.8167 0.4371 0.962 -0.1369 -0.0665 ... ..$ cbi1_5 : num [1:638730] -0.348 -0.348 -0.348 -0.348 -0.348 ... ..$ cbi6_10 : num [1:638730] -0.316 -0.316 -0.316 -0.316 -0.316 ... ..$ cbi11_35 : num [1:638730] -0.368 -0.368 -0.368 -0.368 -0.368 ... ..$ stage : num [1:638730] 0.2368 -0.3326 -0.033 0.0177 0.0624 ... ..$ cc : num [1:638730] 1.285 0.291 1.15 0.207 0.354 ... ..$ species : chr [1:638730] "Acorn Woodpecker" "Acorn Woodpecker" "Acorn Woodpecker" "Acorn Woodpecker" ... ..$ ec1 : num [1:638730] 42 42 42 30 30 30 30 30 30 30 ... ..$ ec2 : num [1:638730] 6 6 6 5 5 5 5 5 5 5 ... ..$ ff_n_unit : num [1:638730] 150212 -99 -99 -99 -99 ... ..$ ff_n_rep : num [1:638730] 3 3 4 4 4 4 4 4 4 4 ... ..$ ff_Q : num [1:638730] 0 0 0 0 0 0 0 0 0 0 ... ..$ ff_unit : int [1:638730] 1 2 3 4 5 6 7 8 9 10 ... ..$ ff_rep_index1: num [1:638730] 1 2 3 4 5 6 7 8 9 10 ... ..$ ff_rep_index2: num [1:638730] 150213 150214 150215 150216 150217 ... ..$ ff_rep_index3: num [1:638730] 300425 300426 300427 300428 300429 ... ..$ ff_rep_index4: num [1:638730] -99 -99 446689 446690 446691 ... ..$ ff_rep_index5: num [1:638730] -99 -99 -99 -99 -99 -99 -99 -99 -99 -99 ... $ n_rep : int 5 $ type : chr "single" $ unit_covs : chr [1:12] "ID" "Long" "Lat" "ele" ... $ event_covs: chr [1:2] "ec1" "ec2" $ fp : logi TRUE attr(*, "class")= chr [1:2] "list" "flocker_data"

Apr 11 '25 19:04 skeyser

Sorry I haven't been able to take a look until now. I haven't spent time on that branch in a while, but I'll try to take a look in the next week or two. Probably what happened here is that I modified the emission likelihoods to work with the multi-season format and didn't notice that this broke the single-season format. Since this branch isn't in production, I don't have good unit tests for this stuff on the fp models.

May 10 '25 00:05 jsocolar

Thanks, Jacob! I appreciate the response and your effort in getting this bug sorted out over the coming weeks. I'll keep an eye on this issue in the meanwhile.

Curious if there are any plans to merge this branch into main production branch for the package? Seems like the bioacoustics community (and maybe the broader ecological community?) would benefit from access to these FP model flavors in STAN. This seems like a valuable addition to your already awesome package.

May 12 '25 12:05 skeyser

Take a look now and see if the single-season fp models are working. I think I found and fixed the problem, but I don't have data on hand with which to test them.

Clearly, the single-season version of this model was not sufficiently tested; it would probably be good for me to do a full simulation-based calibration of the model to make sure it's now actually bug-free, even if it appears on its face to work for you. I'll try to do that sometime in the next couple of weeks, but I can't guarantee that I'll find the time.

Unfortunately, I don't have bandwidth to maintain a feature-complete version of the fp models (i.e. to build out post-processing functionality that is standard for other flocker model types). I've also found these models to be conceptually quite challenging, and I'm not even sure how to implement some of the post-processing tools. For this reason, I don't currently see a pathway to getting these models merged into the main branch of the package. However, I'll try to commit to continue to maintain them on this branch in a state where they consistently work.

May 16 '25 04:05 jsocolar

@skeyser just want to make sure you see this--any luck with the latest version?

May 29 '25 03:05 jsocolar

Thanks for the reminder @jsocolar! I got sidetracked with some other projects over the past two weeks. I'll test this out in the next few days and report back.

May 30 '25 03:05 skeyser

Hi @jsocolar, sorry this took me a while to get around to. I just ran the code I had written and looks like the bug is fixed on the latest flocker@alt-fp branch. I'll re-run with a smaller chunk of data to test to see if the model converges or if anything else creates a hang up, but I will close this issue for now.

Jun 12 '25 15:06 skeyser

Hi @jsocolar, just to follow up a little more in detail, but without reopening the issue (since the bug seems fixed).

I also find the FP models to be conceptually difficult to wrap my head around. As such, I had a question pertaining to some of the documentation on the fp_data argument accompanying the FP models. Similarly to the publication I had referenced earlier in this thread, I am also working with BirdNET to classify species detections from acoustic samples and following guidance from Wood, Socolar, et al. Appendix 3. I think one of my biggest confusions at the moment is generating the data for the pp, qq, and frac_true parameters in the fp_data argument. While this argument is not specified in the accompanying Appendix 3, it is in the associate code with the manuscript.

I was hoping you might be able to provide some clarity on how these values should be derived. Following the code, I saw that prior to fitting the flocker model a separate brms categorical model was fitted using binned Pr(True Positives) as a function of either species or species x year combinations called "blocks" in this case. For my specific case blocks are simply the species that is used as a random intercept in the categorical model. The cat model predicts the mean probability that an $obs_i$ for $species_k$ falls into Pr(TP) $bin_j$, this is the $frac$ parameter in the fp_data, which correspond to $D(p)$? pp and qq refer to the Pr(TP) and Pr(FP), respectively. These are derived from the probability of $obs_i$ falling into $bin_j$ for $species_k$, $p$, and $q = 1-p$, correct? Does this seem like the correct interpretation and logic here?

Also, in your experience, what is the run time on these models? I was testing this out for a single species, single year for the ~1600 ARUs we have with a very simple structure (1 ec and 1 uc; no ranefs) at the moment and the model has been struggling to converge (extreme values for the uc beta estimates, low neff, and high Rhats). I changed adapt_delta and max_treedepth, to try to aid but the model has only achieved 25% of the sampling over multiple days and this is for a simple model.

Lastly, I know that this might be a niche topic and I don't want to inflate the GitHub issues section for flocker with issues on the @alt-fp dev branch, so, happy to continue this conversation elsewhere if you prefer!

Jun 20 '25 17:06 skeyser

Howdy @skeyser! Sorry it's taken me so long to respond. In general feel free to pester me when I go radio silent.

I have an answer to exactly what the data inputs should be, and also an answer to why your model seems to be going haywire, which is that there is a substantial (but easily fixable) bug in the single-season implementation of the fp models. This was not caught previously because the single-season versions were never tested--for Connor I went straight to the multi-season versions. The fix is trivial, and I will push it later today (but first I have a big block of meetings). Then I will post a detailed description of how to use the model inputs here.

Aug 05 '25 13:08 jsocolar

Hi there @jsocolar! Thanks so much for the response.

I'll be sure to pester you if/when issues pop back up, but I understand you are busy and I appreciate your assistance. I am very much looking forward to the fix and follow-up.

Aug 05 '25 15:08 skeyser

Ok, I've pushed a commit that I think should fix the bug. Note that I still haven't unit-tested this code nor performed SBC on the single-season fp model... and I don't expect to have time to do so. But if it now gives output that doesn't look like nonsense, that's a good sign!

As far as the inputs go, here's the summary. First of all, the whole fp functionality is designed in an insane way--I was sprinting and iterating to get Connor something that would work, and I never took the time once it was all done to clean up some of the stupider parts of the code. That makes it especially hard to understand the inputs, because they don't make any sense (some of them are redundant). In particular, in a sane version of the code we would pass three things: the putative encounter histories with putative detections coded as the classifer-predicted probability that the detection is true, the block structure of the data, and the probability distribution function for the distribution of classifier-predicted probabilities associated with putative detections (I'll explain what this means below). However, because I didn't understand that several quantities were redundant when I began coding this up, I've ended up requiring 5 inputs where just these 3 inputs should be necessary.

To explain the 5 inputs, let's first introduce another quantity $\rho$ that is not a required input, but is used to compute some of the inputs.

A bit of preliminary explanation:
- The crux of these fp models are that they look just like normal occupancy models, but then conditional on a true non-detection, we write down a likelihood for what the observation will be.
- The possibilities are either that we observe a putative non-detection, or that we observe a false positive.
- In the latter case, we also observe the associated confidence score from the classifier.
We assume that we can associate each score from the classifier with a probability $\rho$, the unconditional probability that the putative detection is real (NOT a false positive). This probability is unconditional in the sense that it does not condition on the covariates associated with the observation, nor on the putative detection history of other visits at the site.
To estimate $\rho$, we perform a calibration procedure to map classifier scores to probabilities that the putative detection corresponds to a true detection. In some cases, it makes sense to perform this calibration in blocks (e.g. species or years) so as to avoid assuming that the map from classifier scores to probabilities is constant across the blocks. We neglect the uncertainty in $\rho$ arising from the calibration procedure and treat our estimates of $\rho$ as fixed.

So with that said, the 5 inputs required by the current implementation of the model are:

obs: the putative encounter history. It doesn't matter if this is coded as zeros and ones or as zeros and the classifier-predicted probabilities $\rho$. Both work; the information needed about $\rho$ is redundantly coded elsewhere.
fp_data$block: the block structure of the encounter data. This should be provided as a matrix in the same shape as obs
fp_data$frac_true: Expected fraction of putative detections that are true. There will be one unique value of this for each block, but the data format wants it passed as a matrix in the same shape as the block matrix, but with the block IDs replaced with the fractions. This fraction can be computed as follows. Within each block, sum $\rho$ to get the expected number of true detections, and divide by the total number of putative detections to get frac_true.
fp_data$P: This is the likelihood contribution associated with a putative detection given that there is a true detection. By definition, when there is a true detection, there will also be a putative detection. However, the likelihood function must also account for the likelihood of observing a specific value of $\rho$ conditional on a true detection. We can approximate this likelihood contribution in two steps. First, we approximate the probability density function for the distribution of classifier scores across all putative detections (lumping the true positives and the false positives). We can approximate this PDF empirically by binning or smoothing. Denote this PDF as $f(\rho)$. Note that the density of true positives will now be the nomralization of $\rho f(\rho)$. The density of false positives will be the normalization of $(1-\rho)f(\rho)$. fp_data$P is the density of true positives associated with each observation (i.e the normalization of $\rho f(\rho)$ ), passed in the same shape as obs. These operations must be carried out per block, i.e. $f$ is estimated per block.
fp_data$QQ: This is the likelihood contribution associated with a putative detection given that there is no true detection. It is the density of false positives arrived at by normalizing $(1-\rho)f(\rho)$ (see above)

Note that it is not the case that P = 1 - QQ, but it is the case that P contains a factor of $\rho$ while QQ contains a factor of $1 - \rho$.

If I had time to clean this up, I would refactor so that we encode $\rho$ in obs, pass block, and pass the probability density function $f$ evaluated at each observation. From this, it would be possible to compute frac_true, P, and QQ internally. Ah well.

Aug 05 '25 23:08 jsocolar

As I've gone back through all of this, I also want to re-emphasize that this model hinges on a set of approxiamtions. In limited testing, these approximations seem to work ok in practice, but I want to make sure you and anyone else go into this with eyes wide open. These approximations are:

we know $\rho$ exactly. (In reality we derive $\rho$ through calibration regressions that carry associated uncertainty.
the per-block expected fraction of detections that are true in the observed data (the sum of $\rho$ divided by the number of putative detections) can be used to exactly represent the fraction of detections that are expected to be true under the generative process. In reality, the observed expectation will be some sample from the generative process, and I don't have a strong argument to support the claim that this sample would have low variance.
We know $f$ exactly (per block).
Conditional on a true non-detection, false positives occur completely at random within blocks.

Aug 05 '25 23:08 jsocolar

Since I've been turning this over in my head for the last couple of days, I want to make a quick post that ties this all together succinctly. The fp model is supposed to encapsulate a generative model that is exactly like a standard occupancy model but with the following modifications:

when the generative model yields a detection, it associates with that detection some classifer score. To add this component to the model, we need to know $\gamma$, the sampling distribution of classifier scores conditional on a true detection.
when the generative model yields a non-detection (which I refer to as a "true non-detection", subsuming non-detections that are correctly observed and non-detections that yield false-positives), with some fixed (block-specific) probability it yields a false positive and associates with that detection some classifier score. To add this component to the model, we need to know $\eta$, the probability of a false-positive conditional on a true non-detection, and $\delta$, the sampling distribution of classifier scores conditional on a false-positive.

The challenge for us is that the classifier scores are a black box, so we don't have a generative model for the scores. We need a way to estimate $\gamma$, $\delta$, and $\eta$.

The fp solution to this problem is to take apply some clever tricks to use the data to approximate these quantities empirically. Caution: these are just approximations, and their reliability has not been studied!

All three of these approximations rely on our ability to use a calibration procedure to estimate $\rho$, the unconditional probability that a putative detection is true, conditional on its classifier score (and the block to which it belongs). From now on, everything we do will happen within blocks, and we will repeat this entire procedure for each block.

We empirically approximate the distribution of the classifier scores across all putative detections, and we treat this as the generative distribution for the classifier scores (lumping together the true detections and false detections). Call this distribution $\omega$
We decompose $\omega$ into $\gamma$ (the distribution conditional on a true detection) and $\delta$ (the distribution conditional on a false-positive). In particular, $\gamma \propto \rho \omega$, while $\delta \propto (1 - \rho)\omega$. We normalize these empirical densities to get our conditional sampling distributions for the classifier scores.
Now we just need $\eta$. Again, we use an approximation (actually, three of them). The total number of false detections in the data ($\nu$) is expected to be roughly the sum of $1 - \rho$. This is our first approximation, and like the previous two approximations, it happens outside of model fitting. The total number $\phi$ of true non-detections is expected to be the number of observed zeros plus $\nu$. However, for reasons that I no longer understand (if I ever did), I recall it being important in practice to approximate $\phi$ not as the observed zeros plus $\nu$, but rather as the expected number of non-detections conditional on the model parameters. So internal to the likelihood computation, we calculate the expected number of true non-detections based on the parameters, and use that to approximate $\phi$. This is our second approximation. Finally, we approximate $\eta$ as $\frac{\nu}{\phi}$. This is our third approximation.

This is the fp model. In my limited testing, it seems to perform well enough, and if anything to yield conservative credible intervals. But your mileage may vary. Because I don't have time to adequately and deeply study this model, it has languished on an unmaintained branch of flocker.

Aug 07 '25 22:08 jsocolar