eds-scikit icon indicating copy to clipboard operation
eds-scikit copied to clipboard

measurement_date crash in biology

Open PPPinson opened this issue 6 months ago • 0 comments

Description

prepare_measurement_table returns error

MissingConceptError: The DataFrame is missing some columns, namely:
- measurement_date

there is often issues with "date columns" in spark + Pandas. We should only use measurement_datetime column.

Solution : delete measurement_date in variable "_measurement_required_columns" in utils.check_data.check_data_and_select_columns_measurement.

How to reproduce the bug

prepare_measurement_table issue

import eds_scikit
from eds_scikit.biology import prepare_measurement_table, ConceptsSet
from eds_scikit.io import HiveData
data = HiveData(MyDB)

leukocytes_set = ConceptsSet("Leukocytes_Blood_Count")
measurement = prepare_measurement_table(
    data,
    start_date="2022-01-01",
    end_date="2022-05-01",
    concept_sets=[leukocytes_set],
    convert_units=False,
    get_all_terminologies=True,
)

date columns issue

sql("SELECT measurement_date FROM measurement limit 10").toPandas()

returns : "AttributeError: Can only use .dt accessor with datetimelike values"

PPPinson avatar Aug 08 '25 08:08 PPPinson