xdssio
xdssio
This is an implementation of a new **Pipeline** which wraps a few standard solutions needs and the vaex state. General idea: Any transformation you do on the dataframe as long...
1. Add a dtypes param to *get_column_names* 2. Add this type to the *getitem* method for a quick shortcut. example: ``` >>> from vaex.ml.datasets import load_titanic >>> df = load_titanic()...
filters
a standard function for text tokenisers
using a column of lists. obviously, it will be "nice to have" to have a list of multiple types, but mostly it's important to have a homogenise column for pixels,...
The VW PR with current master
I found myself needing to zip columns all the time. This makes expressions look and feel a bit more like a list. 1. Implement *\_\_len\_\_* ``` len(df.x) == len(df) ```...
Rename multiple columns Simply iterate and rename columns if possible. returns the new columns which were added. I have created it on another function to allow backward compatibility. We can...
This PR makes iterating on the dataframe simpler with focus for ML. * Iteration in chunks. * Minimal user code. * Progressbar * Multiple epochs ### Core. *df.to_numpy()* and *df.to_chunks()*...
Added functionality for selecting columns by dtypes: ``` df.get_columns_by_dtypes(float) -> [columns of types float, float32, float64] df.get_columns_by_dtypes(np.float32) -> [columns of float32] df.get_columns_by_dtypes([int, str]) -> [columns of types int and str]...
The idea here is to get a quick missing values count. I have considered returning a dict, but eventually settled on a pandas series for two main reasons: a. It's...