[BUG] Global commodities column name mapping conflict
Describe the bug
Older global_commodities input files will have headings Commodity, CommodityName etc.
The mapping from these old columns names to the new names is:
Commodity -> description CommodityName - > commodity
There is a problem here: how does muse know whether a column called "commodity" is in the old parlance or the new parlance?
At present the mapping Commodity -> description does not seem to be listed in csv.py
If we added it, it would incorrectly map a new style column name, because the mapping is done after reducing to camel case.
One tempting fix would be to do the mapping before the case change. However, this would put us back into a situation of being case-sensitive, which we were trying to get away from...
Hopefully there is a better solution? (Or I have misunderstood the cause of the problem..?)
To Reproduce
Run the function standardize_dataframe() from csv.py on the input file global_commodities.csv
Expected behavior
`ValueError: Duplicate columns in Index(['commodity', 'commodity_type', 'commodity', 'emmission_factor'], dtype='object')'
Context
Please, complete the following to better understand the system you are using to run MUSE.
- Operating system (eg. Windows 10): Mac OSX
- MUSE version (eg. 1.0.1): b779f2339abaddcc76e0b196192f8625ebb4486a
- Installation method (eg. pipx, pip, development mode): development mode
- Python version (you can get this running
python --version): 3.9.18
So the old style format is:
Commodity: description of the commodity, just for user reference
CommodityName: name used for the commodity within MUSE and the other input files
In this case, we only actually need the CommodityName field within MUSE.
This is a bit of a hack, but I've dealt with this in MUSE by dropping the Commodity (/commodity) column in the case that there's also a CommodityName (/commodity_name) column present (i.e. where it's following the old format). I do this within read_global_commodities_csv before passing to standardize_dataframe.
For your purposes, I think the possible solutions are:
a) Manually rename the columns before passing to standardize_dataframe
b) Use read_global_commodities_csv to read the file, instead of reading with pandas and passing to standardize_dataframe. In this case the Commodity column will be dropped to avoid the clash, but I guess in doing so you'll lose the description info which may/may not be a problem. I could always modify read_global_commodities_csv so it renames "commodity" -> "description" rather than dropping it. Would that be useful?
If we get modify read_global_commodities_csv to rename "commodity" -> "description" how will that script know that the input file was in the old format ? ("commodity" is a correct column name in the new format so we wouldn't want it to be changed).
Basically the problem I'm alluding to is that commodity (case insensitive) is a valid column name in both the new and old formats.
To clarify, it would only do that if a “commodity_name” column is also present. Similar to what it’s currently doing if you look at the code, but just renaming the “commodity” column rather than dropping it
Sent from Outlook for iOShttps://aka.ms/o0ukef
From: martinstringer @.> Sent: Friday, October 3, 2025 4:54:41 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Bland, Tom @.>; Assign @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] [BUG] Global commodities column name mapping conflict (Issue #795)
This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.
[https://avatars.githubusercontent.com/u/116901130?s=20&v=4]martinstringer left a comment (EnergySystemsModellingLab/MUSE_OS#795)https://github.com/EnergySystemsModellingLab/MUSE_OS/issues/795#issuecomment-3366266346
If we get modify read_global_commodities_csv to rename "commodity" -> "description" how will that script know that the input file was in the old format ? ("commodity" is a correct column name in the new format so we wouldn't want it to be changed).
Basically the problem I'm alluding to is that commodity (case insensitive) is a valid column name in both the new and old formats.
— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/795#issuecomment-3366266346, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFU73D3NIUIQD2HHOLOMS7L3V2L4DAVCNFSM6AAAAACIGKXPBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNRWGI3DMMZUGY. You are receiving this because you were assigned.Message ID: @.***>
So do you think this issue will never arise as a problem when running muse in full?
So do you think this issue will never arise as a problem when running muse in full?
If someone's following either the old convention or the new convention then everything should be fine