Nayef Ahmed
Nayef Ahmed
## Description - This PR enables building TorchArrow with PyTorch in GitHub CI. This also allows us to test the functionality of some of the torchtext operators that were added...
## 🚀 Feature TorchData is currently an optional dependency which means users have to `pip install torchdata` if they want to be able to use our datasets. Since datasets are...
- URL for GloVe.840B.300d not working https://nlp.stanford.edu/data/glove.840B.300d.zip
Summary: ## Problem: pytext got "No module named 'pytorch'" in issue https://github.com/facebookresearch/pytext/issues/1706 It's due to `from pytorch.text.fb.utils import PATH_MANAGER` is internal only but imported in pytext. Actually, `pytorch/text/fb/utils/__init__.py` should be...
## 🐛 Bug The `test_vocab_from_raw_text_file` test is failing on CI for linux platforms due to segmantation faults (see [sample CI job](https://app.circleci.com/pipelines/github/pytorch/text/6713/workflows/016984af-210f-4ec1-ac23-e7b31ea6f465/jobs/231623)). We currently disable the test on Linux to unblock...
The `test_download_charngram_vectors` test is failing on CI for linux platforms due to the following error `urllib.error.HTTPError: HTTP Error 404: Not Found` (see [sample CI job](https://github.com/pytorch/text/actions/runs/4755755079/jobs/8520868662)). We currently disable the test...
## 🚀 Feature We want to add the [`LengthSetterIterDataPipe`](https://github.com/pytorch/data/blob/719616a1b4791034da3d888357e3ef62c70806e3/torchdata/datapipes/iter/util/header.py#L66-L67) to the end of all torchtext datasets. This will allow us to call `len()` on the datapipe object and prevent errors...
## Description - Currently the `fbsync` branch is [172 commits ahead](https://github.com/pytorch/text/compare/main...fbsync), [749 commits behind](https://github.com/pytorch/text/compare/fbsync...main) main. - We want to ensure that `main` and `fbsync` branches are both up to date...