xdssio issues

Results 16 issues of


                                            xdssio

Pipeline proposal

This is an implementation of a new **Pipeline** which wraps a few standard solutions needs and the vaex state. General idea: Any transformation you do on the dataframe as long...

new-feature

add types to get_column_names

1. Add a dtypes param to *get_column_names* 2. Add this type to the *getitem* method for a quick shortcut. example: ``` >>> from vaex.ml.datasets import load_titanic >>> df = load_titanic()...

test for lists

using a column of lists. obviously, it will be "nice to have" to have a list of multiple types, but mostly it's important to have a homogenise column for pixels,...

priority: high

Expression iterator

I found myself needing to zip columns all the time. This makes expressions look and feel a bit more like a list. 1. Implement *\_\_len\_\_* ``` len(df.x) == len(df) ```...

Rename map

Rename multiple columns Simply iterate and rename columns if possible. returns the new columns which were added. I have created it on another function to allow backward compatibility. We can...

Iterators for ML

This PR makes iterating on the dataframe simpler with focus for ML. * Iteration in chunks. * Minimal user code. * Progressbar * Multiple epochs ### Core. *df.to_numpy()* and *df.to_chunks()*...

Added functionality for selecting columns by dtypes: ``` df.get_columns_by_dtypes(float) -> [columns of types float, float32, float64] df.get_columns_by_dtypes(np.float32) -> [columns of float32] df.get_columns_by_dtypes([int, str]) -> [columns of types int and str]...

Countna()

The idea here is to get a quick missing values count. I have considered returning a dict, but eventually settled on a pandas series for two main reasons: a. It's...

xdssio