cancensus icon indicating copy to clipboard operation
cancensus copied to clipboard

Inconsistent behavior for level = "CMA" across census years

Open bdbmax opened this issue 11 months ago • 1 comments

Hello Jens,

I noticed some inconsistent behavior when using get_census(level = "CMA", ...) across different census years, and I wasn’t able to find documentation describing this pattern:

  • 1996/2001: Only CMAs are returned
  • 2006/2011/2016: Both CMAs and CAs are returned, but CAs with municipal code D returns NA values for all vectors
  • 2021: Both CMAs and CAs are returned and include valid data

Here is extracting all CMAs across years, and summing the number of rows to show change between 2001 and 2006:

> CMAs <- lapply(
+   c("CA1996", "CA01", "CA06", "CA11", "CA16", "CA21"),
+   cancensus::get_census,
+   regions = list(C = "01"),
+   level = "CMA"
+ )
> sapply(CMAs, nrow)
[1]  43  46 152 155 157 160

And examples using the 2006 and 2016 datasets:

> cancensus::get_census("CA06", regions = list(C = "01"), level = "CMA", vectors = "v_CA06_103")[c(1,3,10)]
# A tibble: 152 × 3
   GeoUID `Region Name`           `v_CA06_103: Rented`
   <chr>  <fct>                                  <dbl>
 1 10001  St. John's (B)                         20115
 2 10005  Bay Roberts (D)                           NA
 3 10010  Grand Falls-Windsor (D)                   NA
 4 10015  Corner Brook (D)                          NA
> cancensus::get_census("CA16", regions = list(C = "01"), level = "CMA", vectors = "v_CA16_4838")[c(1,3,10)]
# A tibble: 157 × 3
   GeoUID `Region Name`           `v_CA16_4838: Renter`
   <chr>  <fct>                                   <dbl>
 1 10001  St. John's (B)                          25485
 2 10005  Bay Roberts (D)                            NA
 3 10010  Grand Falls-Windsor (D)                    NA
 4 10011  Gander (D)                                 NA
 5 10015  Corner Brook (D)                           NA

Probably a methodological change from StatsCan side? But when using cancensus, it breaks the assumption that level = "CMA" will return only CMA-level geographies with usable data. Could CMAs and CAs be separated when using level = "CMA"? And what would cause the NA issues for CAs (D) in 2006, 2011 and 2016? Or some documentation would help on the inconsistency (I apologize if that is already the case).

Thanks for maintaining this package, it's well-designed and a powerful tool!

bdbmax avatar Mar 30 '25 19:03 bdbmax