filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

FTPFileSystem open file cannot read from filezilla server

Open zzl221000 opened this issue 2 years ago • 5 comments

The ABOR command is sent when the ftp file download is finished, the server no longer responds and raises the TimeoutError

code:

def test1():
    fs: FTPFileSystem = fsspec.filesystem('ftp', asynchronous=True, host='127.0.0.1', port=21, username='admin',
                                          password='admin', timeout=10)

    with fs.open(fs.ls('/realtime_update')[0]['name'], compression='gzip') as f:
        f.read()

fsspec.implementations.ftp.FTPFile._fetch_range

 # stop transfer, we got enough bytes for this block
                self.fs.ftp.abort()

debug log

*get* '150 Starting data transfer.\n'
*resp* '150 Starting data transfer.'
*put urgent* b'ABOR\r\n'
*get* '226 Operation successful\n'
Traceback (most recent call last):
  File "F:\project\python\external_ftp_pipeline\.venv\Lib\site-packages\fsspec\implementations\ftp.py", line 323, in _fetch_range
    self.fs.ftp.retrbinary(
  File "D:\Applications\scoop\apps\python\current\Lib\ftplib.py", line 441, in retrbinary
    callback(data)
  File "F:\project\python\external_ftp_pipeline\.venv\Lib\site-packages\fsspec\implementations\ftp.py", line 315, in callback
    raise TransferDone
fsspec.implementations.ftp.TransferDone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\project\python\external_ftp_pipeline\src\external_ftp\gongshang_ftp.py", line 154, in <module>
    asyncio.run(main())
  File "D:\Applications\scoop\apps\python\current\Lib\asyncio\runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\asyncio\runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\asyncio\base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "F:\project\python\external_ftp_pipeline\src\external_ftp\gongshang_ftp.py", line 136, in main
    for tag, data in load_ftp_infos():
  File "F:\project\python\external_ftp_pipeline\src\external_ftp\gongshang_ftp.py", line 127, in load_ftp_infos
    for data in decode_lines(f):
  File "F:\project\python\external_ftp_pipeline\src\external_ftp\gongshang_ftp.py", line 117, in decode_lines
    for line in source:
  File "D:\Applications\scoop\apps\python\current\Lib\gzip.py", line 399, in readline
    return self._buffer.readline(size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\gzip.py", line 499, in read
    if not self._read_gzip_header():
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\gzip.py", line 468, in _read_gzip_header
    last_mtime = _read_gzip_header(self._fp)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\gzip.py", line 423, in _read_gzip_header
    magic = fp.read(2)
            ^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\gzip.py", line 97, in read
    self.file.read(size-self._length+read)
  File "F:\project\python\external_ftp_pipeline\.venv\Lib\site-packages\fsspec\spec.py", line 1856, in read
    out = self.cache._fetch(self.loc, self.loc + length)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\project\python\external_ftp_pipeline\.venv\Lib\site-packages\fsspec\caching.py", line 189, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
                 ^^^^^^^^^^^^^^^^^^^^^^^^
  File "F:\project\python\external_ftp_pipeline\.venv\Lib\site-packages\fsspec\implementations\ftp.py", line 333, in _fetch_range
    self.fs.ftp.getmultiline()
  File "D:\Applications\scoop\apps\python\current\Lib\ftplib.py", line 230, in getmultiline
    line = self.getline()
           ^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Applications\scoop\apps\python\current\Lib\socket.py", line 706, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
TimeoutError: timed out

zzl221000 avatar Nov 08 '23 13:11 zzl221000

Thanks for reporting.

Are you sure that ABOR is happening when a file transfer is completed? It is supposed to be for the case that only part of a file has been read. Since it appears to be gzip doing the read call, it may be that you merely need to increase the size of the files blocksize to bigger than the size of the file.

Unfortunately, FTP servers are very poorly standardised. I'm not sure that there's anything we can do at the client side about i.

martindurant avatar Nov 14 '23 02:11 martindurant

After commenting the abor code, the program runs fine!

Forced by the poor standardization of ftp server, I finally chose to implement gzipreader on aioftp

zzl221000 avatar Nov 14 '23 06:11 zzl221000

After commenting the abor code, the program runs fine!

Can you please add logging to see whether the whole file is being downloaded multiple times?

martindurant avatar Nov 14 '23 16:11 martindurant

@martindurant

unchanged code log
*cmd* 'TYPE A'
*resp* '200 Type set to A'
*cmd* 'PASV'
*resp* '227 Entering Passive Mode (127,0,0,1,159,68)'
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 0 - 2
*cmd* 'MLSD /realtime_update'
*resp* '150 Starting data transfer.'
*resp* '226 Operation successful'
*cmd* 'TYPE I'
*resp* '200 Type set to I'
*cmd* 'PASV'
*resp* '227 Entering Passive Mode (127,0,0,1,159,70)'
*cmd* 'REST 0'
*resp* '350 Restarting at 0'
*cmd* 'RETR /realtime_update/2023102516.New.txt.gz'
*resp* '150 Starting data transfer.'
Traceback (most recent call last):
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\site-packages\fsspec\implementations\ftp.py", line 323, in _fetch_range
    self.fs.ftp.retrbinary(
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\ftplib.py", line 441, in retrbinary
    callback(data)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\site-packages\fsspec\implementations\ftp.py", line 315, in callback
    raise TransferDone
fsspec.implementations.ftp.TransferDone

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "F:\project\python\fasttext\ftptest\main.py", line 14, in <module>
    run()
  File "F:\project\python\fasttext\ftptest\main.py", line 12, in run
    f.read()
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\gzip.py", line 300, in read
    return self._buffer.read(size)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\gzip.py", line 487, in read
    if not self._read_gzip_header():
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\gzip.py", line 430, in _read_gzip_header
    magic = self._fp.read(2)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\gzip.py", line 96, in read
    self.file.read(size-self._length+read)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\site-packages\fsspec\spec.py", line 1856, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\site-packages\fsspec\caching.py", line 189, in _fetch
    self.cache = self.fetcher(start, end)  # new block replaces old
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\site-packages\fsspec\implementations\ftp.py", line 333, in _fetch_range
    self.fs.ftp.getmultiline()
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\ftplib.py", line 230, in getmultiline
    line = self.getline()
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\ftplib.py", line 212, in getline
    line = self.file.readline(self.maxline + 1)
  File "E:\ProgramData\Miniconda3\envs\fasttext\lib\socket.py", line 704, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out
delete `self.fs.ftp.abort()` log
*cmd* 'TYPE A'
*resp* '200 Type set to A'
*cmd* 'PASV'
*resp* '227 Entering Passive Mode (127,0,0,1,158,221)'
*cmd* 'MLSD /realtime_update'
*resp* '150 Starting data transfer.'
*resp* '226 Operation successful'
*cmd* 'TYPE I'
*resp* '200 Type set to I'
*cmd* 'PASV'
*resp* '227 Entering Passive Mode (127,0,0,1,158,223)'
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 0 - 2
*cmd* 'REST 0'
*resp* '350 Restarting at 0'
*cmd* 'RETR /realtime_update/2023102516.New.txt.gz'
*resp* '150 Starting data transfer.'
*resp* '226 Operation successful'
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 2 - 10
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 10 - 8202
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 8202 - 11221
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 11221 - 14076
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 14076 - 16441
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 16441 - 18679
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 18679 - 20969
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 20969 - 23165
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 23165 - 25292
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 25292 - 27419
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 27419 - 30027
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 30027 - 32318
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 32318 - 34601
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 34601 - 36810
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 36810 - 39141
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 39141 - 41712
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 41712 - 44157
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 44157 - 46544
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 46544 - 48754
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 48754 - 50947
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 50947 - 53343
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 53343 - 55620
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 55620 - 57836
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 57836 - 60129
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 60129 - 62657
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 62657 - 65035
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 65035 - 67481
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 67481 - 69753
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 69753 - 72074
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 72074 - 74321
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 74321 - 76616
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 76616 - 78649
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 78649 - 80916
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 80916 - 83289
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 83289 - 85553
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 85553 - 87725
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 87725 - 89976
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 89976 - 92451
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 92451 - 94691
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 94691 - 96949
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 96949 - 99125
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 99125 - 101487
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 101487 - 103930
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 103930 - 106360
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 106360 - 108616
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 108616 - 110985
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 110985 - 113242
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 113242 - 115512
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 115512 - 117602
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 117602 - 119840
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 119840 - 122086
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 122086 - 124212
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 124212 - 126351
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 126351 - 128304
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 128304 - 130333
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 130333 - 132413
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 132413 - 134869
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 134869 - 137422
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 137422 - 139608
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 139608 - 142130
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 142130 - 144478
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 144478 - 146745
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 146745 - 148955
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 148955 - 151443
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 151443 - 153697
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 153697 - 155869
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 155869 - 158299
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 158299 - 160676
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 160676 - 162747
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 162747 - 164906
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 164906 - 167174
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 167174 - 169558
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 169558 - 171836
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 171836 - 174033
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 174033 - 176401
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 176401 - 178833
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 178833 - 181068
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 181068 - 183108
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 183108 - 185395
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 185395 - 187590
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 187590 - 190119
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 190119 - 192506
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 192506 - 194689
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 194689 - 196943
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 196943 - 199122
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 199122 - 201194
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 201194 - 203539
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 203539 - 205982
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 205982 - 208610
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 208610 - 211157
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 211157 - 213326
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 213326 - 215466
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 215466 - 217674
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 216799 - 219832
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 216799 - 221934
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 216799 - 223922
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 216799 - 216800
DEBUG:fsspec:<File-like object FTPFileSystem, /realtime_update/2023102516.New.txt.gz> read: 216799 - 216801

zzl221000 avatar Nov 15 '23 00:11 zzl221000

I guess ABOR coul be optional? I don't know how you could do random access without it, though.

martindurant avatar Nov 20 '23 17:11 martindurant