diffxpy icon indicating copy to clipboard operation
diffxpy copied to clipboard

Error running de.test.lrt: type <class 'tuple'> not recognized

Open cchrysostomou opened this issue 5 years ago • 2 comments

When I try to run the LRT function I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-378-462f6e08c1ed> in <module>
     17    data=data,
     18    full_formula_loc="1+condition",
---> 19    reduced_formula_loc="1"
     20 )

~/batchglm/diffxpy/diffxpy/testing/tests.py in lrt(data, full_formula_loc, reduced_formula_loc, full_formula_scale, reduced_formula_scale, as_numeric, init_a, init_b, gene_names, sample_description, noise_model, size_factors, batch_size, backend, train_args, training_strategy, quick_scale, dtype, **kwargs)
    412         quick_scale=quick_scale,
    413         dtype=dtype,
--> 414         **kwargs
    415     )
    416     full_model = _fit(

~/batchglm/diffxpy/diffxpy/testing/tests.py in _fit(noise_model, data, design_loc, design_scale, design_loc_names, design_scale_names, constraints_loc, constraints_scale, init_model, init_a, init_b, gene_names, size_factors, batch_size, backend, training_strategy, quick_scale, train_args, close_session, dtype)
    188         chunk_size_genes=chunk_size_genes,
    189         as_dask=backend.lower() in ["numpy"],
--> 190         cast_dtype=dtype
    191     )
    192 

~/batchglm/batchglm/models/base_glm/input.py in __init__(self, data, design_loc, design_loc_names, design_scale, design_scale_names, constraints_loc, constraints_scale, size_factors, observation_names, feature_names, chunk_size_cells, chunk_size_genes, as_dask, cast_dtype)
     94         design_loc, design_loc_names = parse_design(
     95             design_matrix=design_loc,
---> 96             param_names=design_loc_names
     97         )
     98         design_scale, design_scale_names = parse_design(

~/batchglm/batchglm/models/base_glm/utils.py in parse_design(design_matrix, param_names)
     39         params = None
     40     else:
---> 41         raise ValueError("type %s not recognized" % type(design_matrix))
     42 
     43     if param_names is not None:

ValueError: type <class 'tuple'> not recognized

This was the code block I ran to generate the error (versions I am using are: batchglm version v0.7.4+5.g31b905b, diffpy version v0.7.4+16.g3689ea8):

sim = Simulator(num_observations=200, num_features=100)
sim.generate_sample_description(num_batches=0, num_conditions=4)
sim.generate_params(
    rand_fn_loc=lambda shape: np.random.uniform(-0.1, 0.1, shape),
    rand_fn_scale=lambda shape: np.random.uniform(0.1, 2, shape)
)
sim.generate_data()
sim.x.min()

data = anndata.AnnData(
    X=sim.x,
    var=pd.DataFrame(index=["gene" + str(i) for i in range(sim.x.shape[1])]),
    obs=sim.sample_description
)

test_lrt = de.test.lrt(
   data=data, 
   full_formula_loc="1+condition",
   reduced_formula_loc="1"
)

Please let me know what I am doing incorrectly and how to properly run the code block above. Thanks!

cchrysostomou avatar May 13 '20 12:05 cchrysostomou

Hi @cchrysostomou, I recommend using Wald test for now, I am still working on gettting lrt to work in the new version, Wald is faster and preferable in my opinion though!

davidsebfischer avatar May 18 '20 08:05 davidsebfischer

@davidsebfischer. Thanks for the response, I'll be sure to focus on using Wald. This next question might be best served in a separate thread, but out of curiosity, I was wondering what is the expected data format of input data. I tried looking through tutorial but it wasn't overtly clear.

For example should we assume that our scanpy object should already be log normalized and depth normalized, or does it expect the data to be raw read counts and handles any normalization and scaling in the function?

cchrysostomou avatar May 19 '20 17:05 cchrysostomou