Can't download October 2022 Predispatch forecast
I'm trying to download the predispatch price forecast data over the time range:
cache_start_time = "2021/01/01 00:00:00"
cache_end_time = "2023/10/31 00:00:00"
This range should be available, according to the output of:
nemseer.get_data_daterange()
However, when I run the download (see attached jupyter notebook), I get the following error message:
HTTPError: 404 Client Error: Not Found for url: http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/MMSDM_2022_10/MMSDM_Historical_Data_SQLLoader/PREDISP_ALL_DATA/PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip
It's expecting a file with basename:
PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip
but the basename that actually exists in that directory is:
PUBLIC_DVD_PREDISPATCHCONSTRAINT1_202210010000.zip
Do you have any suggestions?
Hi @notuntoward
nemseer.get_data_daterange() just checks if a particular month has data - i.e. it scrapes something like http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/ and then sees that all months (01-12) are available. It doesn't check whether particular data files are available.
As you say, it looks like Oct 2022 only has PREDISPATCHCONSTRAINT1 available and not PREDISPATCHPRICE. This is an issue at AEMO's end as they haven't made the data available for that month.
There are 3 things you could do:
- Skip data for October 2022
- Contact AEMO and ask for this data. I have done this and they have provided it (though sometimes this can take a few weeks/months)
- Get PREDISPATCHPRICE_D for October 2022, though this only contains the last forecast for each interval
Thanks very much for the quick answer. Guess I'll skip October for now and email AEMO and get it eventually.
Coming back to this problem of missing data, does NEMSEER have a way of telling which dates to avoid downloading? I'm hitting dates with bad data again, and it would be better if I knew all of the bad ones ahead of time, so that I could programmatically patch around them, avoiding the crashes.
Hi @notuntoward,
I didn't build that functionality into NEMSEER because I didn't encounter that issue early on.
That being said, it's something I would be open to implementing.
What I did implement was a stub file (.invalid_aemo_files.txt) that was written whenever a bad zipfile was encountered. The downloader checks the stub file and doesn't download that file again if it's found to be corrupt: https://github.com/UNSW-CEEM/NEMSEER/blob/da08c28778fe0aa9b6f158323380863dcf9bc4d7/src/nemseer/downloader.py#L318-L327
An ideal in-built solution would probably use a try...except clause for the download (probably somewhere here, or in the Downloader class) and also write to a stub file with a name like .missing_aemo_files.txt. Then the Downloader would check both stubs before downloading files, and issue a warning if there were missing or invalid files. I wouldn't try to built in a "file verifier" (i.e. a scraper that checks files at a URL) - this is easy to do by extending existing code but it will add latency to downloading files via NEMSEER.
I'm quite busy finishing up writing my thesis at the moment. If you have some time @notuntoward I'd be happy to review a pull request if you're able to pull one together.
Abi
Thanks for the suggestions. I know how thesis writing goes...