[FEATURE] Simple data iterator for deeplake.Dataset
🚨🚨 Feature Request
- [ ] Related to an existing Issue
- [x] A new implementation (Improvement, Extension)
Is your feature request related to a problem?
The current implementation requires TensorFlow or PyTorch to generate the iterator on the Windows.
Of course, I could use deplake.Dataset.dataloader to accomplish something like this question.
I would like to provide a simple method that can be done identically in all environments.
For example, I have assumed an implementation to preprocess all data in turn on the CPU using this feature.
To create data similar with the current deeplake would require some conversion process. I assume that all series data is NumPy, and that all other data can be obtained with appropriate types such as str, int, list, etc.
Description of the possible solution
A deeplake.Dataset.tensorflow() includes generator function that yields dictionary of records.
I guess customizing its implementation.
An alternative solution to the problem can look like
ds = deeplake.empty("./example")
ds.create_tensor("image", htype="image.rgb")
ds.create_tensor("tags", htype="list")
ds.create_tensor("caption", htype="text")
for dict_of_tensor in ds.numpy():
print(dict_of_tensor) # {"image": np.ndarray, "tags": list of str, "caption": str}
hey I have solved this issue can I put a pull request
def dict_record(self):
from deeplake.enterprise import dataloader
return iter(map(lambda row: dict(row[0]), dataloader(self).numpy()))
this is the code I have added
Hi @pyther-hub, absolutely! Go for it.
Hi @pyther-hub, absolutely! Go for it.
sir I have put a pull request please review it
Is something still left to be done?
can I do work on this again? @tatevikh