ValueError: invalid entry in coordinates array with semester dates
Hello,
I have some origin and development data that are in a semester format (example : 2021-S1). I converted the data by changing YYYY-S1 to YYYY-01-01 and YYYY-S2 to YYYY-07-01. I have a panda Dataframe that looks like this : index origin development paid 0 2011-01-01 2011-01-01 179.74 1 2011-01-01 2011-07-01 664.94 2 2011-01-01 2012-01-01 7471.75 3 2011-01-01 2012-07-01 820.99 4 2011-01-01 2013-01-01 908.77
triangle = cl.Triangle(data=df_sub,
origin="origin",
origin_format="%Y-%m-%d",
development="development",
development_format="%Y-%m-%d",
columns="paid",
cumulative=True
)
ValueError: invalid entry in coordinates array
I am using the lastest version of chainladder : pandas: 1.4.2 numpy: 1.22.4 chainladder: 0.8.13
This should work. Marking as bug.
import chainladder as cl
import io
import pandas as pd
df_sub = pd.read_csv(
io.StringIO("""
2011-01-01, 2011-01-01, 179.74
2011-01-01, 2011-07-01, 664.94
2011-01-01, 2012-01-01, 7471.75
2011-01-01, 2012-07-01, 820.99
2011-01-01, 2013-01-01, 908.77
"""),
names=['origin', 'development', 'paid'],
parse_dates=['origin', 'development'])
triangle = cl.Triangle(data=df_sub,
origin="origin",
origin_format="%Y-%m-%d",
development="development",
development_format="%Y-%m-%d",
columns="paid",
cumulative=True
)
Thank you for your answer, but sadly your code does not seem to work, I have the exact same error:
Traceback (most recent call last):
File "C:\Users\xxxxxxxx\AppData\Local\Temp\ipykernel_2108\780531685.py", line 1, in <cell line: 1> triangle = cl.Triangle(data=df_sub,
File "C:\Users\xxxxxxxxx.conda\envs\chain_ladder\lib\site-packages\chainladder\core\triangle.py", line 218, in init self.values = num_to_nan(
File "C:\Users\xxxxxxxx.conda\envs\chain_ladder\lib\site-packages\chainladder\utils\utility_functions.py", line 294, in num_to_nan return num_to_value(arr, xp.nan)
File "C:\Users\xxxxxx.conda\envs\chain_ladder\lib\site-packages\chainladder\utils\utility_functions.py", line 279, in num_to_value arr = sp(
File "C:\Users\xxxxxx.conda\envs\chain_ladder\lib\site-packages\sparse_coo\core.py", line 296, in init self._sort_indices()
File "C:\Users\xxxxxx.conda\envs\chain_ladder\lib\site-packages\sparse_coo\core.py", line 1244, in _sort_indices linear = self.linear_loc()
File "C:\Users\xxxxx.conda\envs\chain_ladder\lib\site-packages\sparse_coo\core.py", line 940, in linear_loc return linear_loc(self.coords, self.shape)
File "C:\Users\xxxx.conda\envs\chain_ladder\lib\site-packages\sparse_coo\common.py", line 65, in linear_loc return np.ravel_multi_index(coords, shape)
File "<array_function internals>", line 180, in ravel_multi_index
ValueError: invalid entry in coordinates array
Yes, I know it doesn't work, sorry for not being clearer. I just reformatted your code to have a REPREX for testing purposes. This is a bug.
I'm trying to address this bug, but I see this piece of code:
if len(dates.unique()) == 1:
grain = 'M'
Is there a reason why if the length is one, the default grain is set to monthly? I would think it should be annually, wouldn't we want the least amount of granularity?
Mind if I take this one? I think the triangle instantiation has layers of special case support and the code is difficult to reason about. I think a refactor is in order so that it is easier to support broader contributions.
I think I got this already, it has to two with the "2Q-DEC" frequency. You can look at my branch.
https://github.com/casact/chainladder-python/blob/294/chainladder/core/base.py#L201