Error running de.test.lrt: type <class 'tuple'> not recognized
When I try to run the LRT function I get the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-378-462f6e08c1ed> in <module>
17 data=data,
18 full_formula_loc="1+condition",
---> 19 reduced_formula_loc="1"
20 )
~/batchglm/diffxpy/diffxpy/testing/tests.py in lrt(data, full_formula_loc, reduced_formula_loc, full_formula_scale, reduced_formula_scale, as_numeric, init_a, init_b, gene_names, sample_description, noise_model, size_factors, batch_size, backend, train_args, training_strategy, quick_scale, dtype, **kwargs)
412 quick_scale=quick_scale,
413 dtype=dtype,
--> 414 **kwargs
415 )
416 full_model = _fit(
~/batchglm/diffxpy/diffxpy/testing/tests.py in _fit(noise_model, data, design_loc, design_scale, design_loc_names, design_scale_names, constraints_loc, constraints_scale, init_model, init_a, init_b, gene_names, size_factors, batch_size, backend, training_strategy, quick_scale, train_args, close_session, dtype)
188 chunk_size_genes=chunk_size_genes,
189 as_dask=backend.lower() in ["numpy"],
--> 190 cast_dtype=dtype
191 )
192
~/batchglm/batchglm/models/base_glm/input.py in __init__(self, data, design_loc, design_loc_names, design_scale, design_scale_names, constraints_loc, constraints_scale, size_factors, observation_names, feature_names, chunk_size_cells, chunk_size_genes, as_dask, cast_dtype)
94 design_loc, design_loc_names = parse_design(
95 design_matrix=design_loc,
---> 96 param_names=design_loc_names
97 )
98 design_scale, design_scale_names = parse_design(
~/batchglm/batchglm/models/base_glm/utils.py in parse_design(design_matrix, param_names)
39 params = None
40 else:
---> 41 raise ValueError("type %s not recognized" % type(design_matrix))
42
43 if param_names is not None:
ValueError: type <class 'tuple'> not recognized
This was the code block I ran to generate the error (versions I am using are: batchglm version v0.7.4+5.g31b905b, diffpy version v0.7.4+16.g3689ea8):
sim = Simulator(num_observations=200, num_features=100)
sim.generate_sample_description(num_batches=0, num_conditions=4)
sim.generate_params(
rand_fn_loc=lambda shape: np.random.uniform(-0.1, 0.1, shape),
rand_fn_scale=lambda shape: np.random.uniform(0.1, 2, shape)
)
sim.generate_data()
sim.x.min()
data = anndata.AnnData(
X=sim.x,
var=pd.DataFrame(index=["gene" + str(i) for i in range(sim.x.shape[1])]),
obs=sim.sample_description
)
test_lrt = de.test.lrt(
data=data,
full_formula_loc="1+condition",
reduced_formula_loc="1"
)
Please let me know what I am doing incorrectly and how to properly run the code block above. Thanks!
Hi @cchrysostomou, I recommend using Wald test for now, I am still working on gettting lrt to work in the new version, Wald is faster and preferable in my opinion though!
@davidsebfischer. Thanks for the response, I'll be sure to focus on using Wald. This next question might be best served in a separate thread, but out of curiosity, I was wondering what is the expected data format of input data. I tried looking through tutorial but it wasn't overtly clear.
For example should we assume that our scanpy object should already be log normalized and depth normalized, or does it expect the data to be raw read counts and handles any normalization and scaling in the function?