NEMSEER icon indicating copy to clipboard operation
NEMSEER copied to clipboard

Can't download October 2022 Predispatch forecast

Open notuntoward opened this issue 2 years ago • 5 comments

I'm trying to download the predispatch price forecast data over the time range:

cache_start_time = "2021/01/01 00:00:00"
cache_end_time = "2023/10/31 00:00:00"

This range should be available, according to the output of:

nemseer.get_data_daterange()

However, when I run the download (see attached jupyter notebook), I get the following error message:

HTTPError: 404 Client Error: Not Found for url: http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/MMSDM_2022_10/MMSDM_Historical_Data_SQLLoader/PREDISP_ALL_DATA/PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip

It's expecting a file with basename:

PUBLIC_DVD_PREDISPATCHPRICE_202210010000.zip

but the basename that actually exists in that directory is:

PUBLIC_DVD_PREDISPATCHCONSTRAINT1_202210010000.zip

Do you have any suggestions?

mkFeats_ERROR_REPRO_is_really_ipynb.txt

notuntoward avatar Nov 15 '23 05:11 notuntoward

Hi @notuntoward

nemseer.get_data_daterange() just checks if a particular month has data - i.e. it scrapes something like http://www.nemweb.com.au/Data_Archive/Wholesale_Electricity/MMSDM/2022/ and then sees that all months (01-12) are available. It doesn't check whether particular data files are available.

As you say, it looks like Oct 2022 only has PREDISPATCHCONSTRAINT1 available and not PREDISPATCHPRICE. This is an issue at AEMO's end as they haven't made the data available for that month.

There are 3 things you could do:

  1. Skip data for October 2022
  2. Contact AEMO and ask for this data. I have done this and they have provided it (though sometimes this can take a few weeks/months)
  3. Get PREDISPATCHPRICE_D for October 2022, though this only contains the last forecast for each interval

prakaa avatar Nov 16 '23 02:11 prakaa

Thanks very much for the quick answer. Guess I'll skip October for now and email AEMO and get it eventually.

notuntoward avatar Nov 16 '23 04:11 notuntoward

Coming back to this problem of missing data, does NEMSEER have a way of telling which dates to avoid downloading? I'm hitting dates with bad data again, and it would be better if I knew all of the bad ones ahead of time, so that I could programmatically patch around them, avoiding the crashes.

notuntoward avatar Feb 14 '24 23:02 notuntoward

Hi @notuntoward,

I didn't build that functionality into NEMSEER because I didn't encounter that issue early on.

That being said, it's something I would be open to implementing.

What I did implement was a stub file (.invalid_aemo_files.txt) that was written whenever a bad zipfile was encountered. The downloader checks the stub file and doesn't download that file again if it's found to be corrupt: https://github.com/UNSW-CEEM/NEMSEER/blob/da08c28778fe0aa9b6f158323380863dcf9bc4d7/src/nemseer/downloader.py#L318-L327

An ideal in-built solution would probably use a try...except clause for the download (probably somewhere here, or in the Downloader class) and also write to a stub file with a name like .missing_aemo_files.txt. Then the Downloader would check both stubs before downloading files, and issue a warning if there were missing or invalid files. I wouldn't try to built in a "file verifier" (i.e. a scraper that checks files at a URL) - this is easy to do by extending existing code but it will add latency to downloading files via NEMSEER.

I'm quite busy finishing up writing my thesis at the moment. If you have some time @notuntoward I'd be happy to review a pull request if you're able to pull one together.

Abi

prakaa avatar Feb 15 '24 00:02 prakaa

Thanks for the suggestions. I know how thesis writing goes...

notuntoward avatar Feb 27 '24 18:02 notuntoward