Angela Lin issues

Results 30 issues of


                                            Angela Lin

Decrease runtime of lint job

#2670 introduced `pydocstyle` and `darglint` packages. The `darglint` package specifically increases the runtime of our lint job by a few minutes. While we were okay with this addition, I suspect...

enhancement

testing

performance

spike

Add util method to component graph to get components based on component class

Right now, `component_graph.get_component` expects a string which is the unique name used to find a component in the graph (ex: "My Label Encoder", and not "Label Encoder"). This makes it...

new feature

good first issue

Add model selection split for automl search

Rather than relying on the CV scores to rank the pipelines on the leaderboard, perhaps we should have a model selection split where we hold out some data and rank...

new feature

needs design

spike

Refactor ``test_components.py::test_describe_component``

``test_components.py::test_describe_component`` is a test that checks if a component.describe() returns the appropriate result. However, if a dev adds a new component, there is nothing requiring the dev to add that...

refactor

testing

ReadtheDocs failure with AutoAPI package

I noticed a weird error in https://github.com/alteryx/evalml/pull/2546, a small PR which moved `get_hyperparameter_ranges` to `PipelineBase`. The failed ReadtheDocs build is here: https://readthedocs.com/projects/feature-labs-inc-evalml/builds/683782/, with the following error: ``` Traceback (most recent...

bug

documentation

testing

Clean up or move away from using graphviz for component graph / pipeline's `.graph()` method

We currently use graphviz to generate our graphical representation of component graphs / pipelines. https://github.com/alteryx/evalml/pull/2654 updated this representation to include X and y nodes and edges, but seems a little...

enhancement

spike

Add helper method to combine pipelines

Separating out work from https://github.com/alteryx/evalml/issues/2058, https://github.com/alteryx/evalml/pull/2968 tackled the first half of creating a preprocessing pipeline that will encompass all of the components created from data check actions. This issue will...

enhancement

new feature

tech debt

Improve implementation of ``NullDataCheck``

Follow up on https://github.com/alteryx/evalml/pull/3182 based on @freddyaboulton's comment: I think we can improve this implementation. Right now we do two scans of the data to determine the highly null columns...

refactor

good first issue

performance

Add feature distribution to partial dependence plots

It could be useful to add feature distribution (via histogram?) to our partial dependence plots so users can determine whether there is sufficient data to interpret the relationship between the...

enhancement

good first issue

Initialization of Woodwork DataTable using pandas DataFrame and then numpy array causes different behavior from just initalization using numpy array.

If I initialize a Woodwork DataTable using a pandas DataFrame and then initialize another Woodwork DataTable using the numpy array underneath, it creates a Woodwork DataTable with category types. However,...

bug