simple_ado ADOGitClient class download_zip method not displaying progress bar

I believe this is due to the server in question responding without a content-length header because of chunked encoding.

I initially found your project in a roundabout way because of a similar issue with azure-devops-python-api where when pulling a tree zip, a Generator would be created, so there was no reasonable way to get a progress update bar going. The interface you've created is a lot simpler and easy to use, which is perfect for my purposes, and has made my development much more streamlined, but it would be nice to find a way to show a progress bar. For large mono-repos, a download can take quite some time.

Perhaps there is a way to recursively get all items from a tree via a given path and then calculate an approximate total size? I played a bit with hacking up download_zip to simply get all tree entries from a given path, ignoring content and grabbing metadata, and it did seem to have sizes returned. Perhaps if we summed these, we could simply pass them into the download_zip call and have that be the fall back if the content-length isn't provided?

Dec 24 '22 07:12 natemcfeters

Looked into this a bit more... I think the issue here is that if the server wants to respond with Chunked Encoding, and doesn't want to provide a Content-Length (which I think the spec says it shouldn't), I don't think there's a way to have a progress bar that shows much actual progress. That said, I did find a nice stackoverflow article that I think provides a better option than what's currently there. See: Progress Bar while Download File Over HTTP with Requests. Credit to original author Mike.

Dec 24 '22 08:12 natemcfeters

Yup, you are correct about the root issue here. Unfortunately, if they don't tell us the length, there's nothing we can do during the request.

I do like your idea for iterating and getting an approximate size, but I'm not sure how feasible that would be since you wouldn't reliably be able to guess the compressed size?

Dec 24 '22 09:12 dalemyers

I've managed to get around this some... obviously you don't see a progress bar indicating progress if you don't get a content-length, but I've created a sensible fall back (IMO). See code below:

def download_from_response_stream(
    ...
    path = pathlib.Path(output_path).expanduser().resolve()
    path.parent.mkdir(parents=True, exist_ok=True)
    file_size = int(response.headers.get('Content-Length', 0))
    desc = "(Unknown total file size)" if file_size == 0 else "Downloaded"
    response.raw.read = functools.partial(response.raw.read, decode_content=True)  # Decompress if needed
    downloaded = 0
    with tqdm.wrapattr(
            path.open("wb"), "write", unit='B', unit_scale=True, unit_divisor=1024, miniters=1,desc=desc, total=file_size
    ) as fout:
        for chunk in response.iter_content(chunk_size=4096):
            downloaded+=len(chunk)
            fout.write(chunk)
    downloaded = str("{:.2f}".format(int(downloaded)/1048576))
    print(Fore.RED, f"Downloaded: {downloaded}MB") if file_size == 0 else print(Fore.BLUE, "Complete.")

Obviously this depends upon tqdm and colorama. Seems to work. Example output below shows what happens when you don't get a content length header (if you do get one, then you'll get a progress bar).

(Unknown total file size): 8.78MB [00:04, 1.94MB/s]
 Downloaded: 8.78MB

Dec 24 '22 19:12 natemcfeters

Yeah, this is the best thing you could do, however I don't think it's up to the library to do this. A callback should be provided by the caller and when there's a progress update, we should call it with the downloaded bytes and the total bytes and then the original caller can do what they want with it.

Dec 27 '22 09:12 dalemyers

Alright, I can see about modifying it as such.

Dec 28 '22 20:12 natemcfeters

@dalemyers I've modified simple_ado to handle a callback to address this as you've suggested and also produced a small test case to illustrate it. I did not add the test case into your pytest framework, but could do so at a later time, if desired. You can find that here: https://github.com/natemcfeters/simple_ado/tree/progress_callback.

Dec 30 '22 00:12 natemcfeters

I opened those commits as a PR to make it easier to what has changed: https://github.com/microsoft/simple_ado/pull/16 I've also added a review.

Dec 30 '22 12:12 dalemyers

Cool I'll have a look at it and see what I can do to fix it up.

Dec 30 '22 21:12 natemcfeters

Ok, I made a new pull request to fix up comments 1 to 3, but the 4th comment on handling the actual call back I've not resolved yet, as I need to explore it more. I have two questions there:

Why wouldn't we want to simply delegate the total control here to the user? If it's a question of leaving too much in their hands to do I think that could be resolved with documentation or with adding my test case as an example.
If you still want to see the change you've suggested, what do you want the expected behavior to be? Should we be yielding a block at a time back to them and letting them write to disk? Simply yielding back to them the size read against total size?

I feel like if we do too much handling ourselves here, then the user has less control over how they'd like to display progress.

Dec 30 '22 22:12 natemcfeters