MUSE_OS icon indicating copy to clipboard operation
MUSE_OS copied to clipboard

[BUG] Global commodities column name mapping conflict

Open martinstringer opened this issue 4 months ago • 5 comments

Describe the bug

Older global_commodities input files will have headings Commodity, CommodityName etc.

The mapping from these old columns names to the new names is:

Commodity -> description CommodityName - > commodity

There is a problem here: how does muse know whether a column called "commodity" is in the old parlance or the new parlance?

At present the mapping Commodity -> description does not seem to be listed in csv.py

If we added it, it would incorrectly map a new style column name, because the mapping is done after reducing to camel case.

One tempting fix would be to do the mapping before the case change. However, this would put us back into a situation of being case-sensitive, which we were trying to get away from...

Hopefully there is a better solution? (Or I have misunderstood the cause of the problem..?)

To Reproduce

Run the function standardize_dataframe() from csv.py on the input file global_commodities.csv

Expected behavior

`ValueError: Duplicate columns in Index(['commodity', 'commodity_type', 'commodity', 'emmission_factor'], dtype='object')'

Context

Please, complete the following to better understand the system you are using to run MUSE.

  • Operating system (eg. Windows 10): Mac OSX
  • MUSE version (eg. 1.0.1): b779f2339abaddcc76e0b196192f8625ebb4486a
  • Installation method (eg. pipx, pip, development mode): development mode
  • Python version (you can get this running python --version): 3.9.18

martinstringer avatar Oct 03 '25 11:10 martinstringer

So the old style format is: Commodity: description of the commodity, just for user reference CommodityName: name used for the commodity within MUSE and the other input files

In this case, we only actually need the CommodityName field within MUSE.

This is a bit of a hack, but I've dealt with this in MUSE by dropping the Commodity (/commodity) column in the case that there's also a CommodityName (/commodity_name) column present (i.e. where it's following the old format). I do this within read_global_commodities_csv before passing to standardize_dataframe.

For your purposes, I think the possible solutions are: a) Manually rename the columns before passing to standardize_dataframe b) Use read_global_commodities_csv to read the file, instead of reading with pandas and passing to standardize_dataframe. In this case the Commodity column will be dropped to avoid the clash, but I guess in doing so you'll lose the description info which may/may not be a problem. I could always modify read_global_commodities_csv so it renames "commodity" -> "description" rather than dropping it. Would that be useful?

tsmbland avatar Oct 03 '25 13:10 tsmbland

If we get modify read_global_commodities_csv to rename "commodity" -> "description" how will that script know that the input file was in the old format ? ("commodity" is a correct column name in the new format so we wouldn't want it to be changed).

Basically the problem I'm alluding to is that commodity (case insensitive) is a valid column name in both the new and old formats.

martinstringer avatar Oct 03 '25 15:10 martinstringer

To clarify, it would only do that if a “commodity_name” column is also present. Similar to what it’s currently doing if you look at the code, but just renaming the “commodity” column rather than dropping it

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: martinstringer @.> Sent: Friday, October 3, 2025 4:54:41 PM To: EnergySystemsModellingLab/MUSE_OS @.> Cc: Bland, Tom @.>; Assign @.> Subject: Re: [EnergySystemsModellingLab/MUSE_OS] [BUG] Global commodities column name mapping conflict (Issue #795)

This email from @.*** originates from outside Imperial. Do not click on links and attachments unless you recognise the sender. If you trust the sender, add them to your safe senders listhttps://spam.ic.ac.uk/SpamConsole/Senders.aspx to disable email stamping for this address.

[https://avatars.githubusercontent.com/u/116901130?s=20&v=4]martinstringer left a comment (EnergySystemsModellingLab/MUSE_OS#795)https://github.com/EnergySystemsModellingLab/MUSE_OS/issues/795#issuecomment-3366266346

If we get modify read_global_commodities_csv to rename "commodity" -> "description" how will that script know that the input file was in the old format ? ("commodity" is a correct column name in the new format so we wouldn't want it to be changed).

Basically the problem I'm alluding to is that commodity (case insensitive) is a valid column name in both the new and old formats.

— Reply to this email directly, view it on GitHubhttps://github.com/EnergySystemsModellingLab/MUSE_OS/issues/795#issuecomment-3366266346, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFU73D3NIUIQD2HHOLOMS7L3V2L4DAVCNFSM6AAAAACIGKXPBWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGNRWGI3DMMZUGY. You are receiving this because you were assigned.Message ID: @.***>

tsmbland avatar Oct 03 '25 16:10 tsmbland

So do you think this issue will never arise as a problem when running muse in full?

martinstringer avatar Oct 06 '25 09:10 martinstringer

So do you think this issue will never arise as a problem when running muse in full?

If someone's following either the old convention or the new convention then everything should be fine

tsmbland avatar Oct 06 '25 09:10 tsmbland