choco icon indicating copy to clipboard operation
choco copied to clipboard

Column `expanded` in `meta.csv` contains `NaN` values

Open rubencart opened this issue 2 years ago • 4 comments

What do these mean?

import pandas as pd
df = pd.read_csv('meta.csv')
print(df.expanded.isna().sum())
Out[5]: 5878

rubencart avatar Apr 09 '24 17:04 rubencart

Hi @rubencart, please be more specific when you describe any issue in ChoCo.

jonnybluesman avatar Apr 10 '24 10:04 jonnybluesman

Sorry, I thought the question was clear. The provided meta.csv file in its expanded column contains 3 values: True, False and something that pandas considers as NaN. My question is what do these 3 values represent?

  1. What does it mean for a song to be expanded (could not find this explained anywhere)
  2. What does it mean for a song to have expanded == NaN vs expanded == False?
df = pd.read_csv('data/.../meta.csv')
(df.expanded == True).sum()
Out[1]: 8202
(df.expanded == False).sum()
Out[2]: 6006
df.expanded.isna().sum()
Out[3]: 5878
len(df)
Out[4]: 20086
8202 + 6006 + 5878
Out[5]: 20086

rubencart avatar Apr 10 '24 13:04 rubencart

Thanks for clarifying. At the moment, the script generating the meta.csv iterates over all JAMS files and extracts the information that are summarised in the CSV. This may still contain some bugs if the respective fields in JAMS are not consistent, which necessitates some checks for the expansion attribute (it is probably a simple bug when parsing). I will keep this issue for us to check.

Instead, to address your first question, expansion (for a score-based annotation) means that the score "has been expanded" to flatten out all the repetitions. For example, if a sequence of bars (with chord annotations) has a repeat sign, what we basically do is to expand the score by actually unrolling the repetition (as if the score is performed). This happens for all the ireal-pro subset for example.

jonnybluesman avatar Apr 10 '24 16:04 jonnybluesman

I see, thank you!

rubencart avatar Apr 11 '24 13:04 rubencart