python-novice-gapminder "Group By"

Hello dear maintainers,

I had the chance to teach Python session for beginners. While I was doing, I figure it out the existing “Group By” lesson is bit hard to understand.

So, I modified the section using same data set with different point of view. Here is my example of doing “Group By”.

*************************************** Start ***********************************

Group By: split-apply-combine

Any groupby operation involves one of the following operations on the original object. They are −
- Splitting the Object
- Applying a function
- Combining the results
Split-apply-combine technique

Split-Apply-Combine

Source: https://cmdlinetips.com/2018/02/introduction-to-split-apply-combine-with-pandas/

Used "gapminder_all.csv" csv file.
- Index using continent column.
- Use Pandas "groupby()" method using continent column.

data = pd.read_csv('data/gapminder_all.csv', index_col=["continent"])
subset = data.groupby('continent')
print(subset)

Above output have duplicate values.
By using 'count()' method in Pandas we can get the exact record count (number of countries).

print(subset["country"].count())

Use Pandas describe() method to do statistical analysis based on continents.

print(subset.groupby('continent').describe())

*************************************** End ***********************************

Conclusions

If you think the above mentioned suggestions could help, please do necessary changes to the lesson.

Sep 30 '20 20:09 mcdperera

I also have had trouble teaching groupby - mainly because I didn't understand it well myself.

The problems I had with the example that calculates the wealth index:

I can't see quickly the meaning of the second line. I had an easier time understanding groupby if I could predict what the answer would be
The numbers output are a jumble (might be better sorted)
Output is long (screen real-estate)

I like this new example because it solves those problems. I love the figure. However, I don't understand a couple of the outputs.

print(subset) returns <pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant?
print(subset.groupby('continent').describe()) returned an AttributeError for me

'DataFrameGroupBy' object has no attribute 'groupby'

the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame

subset.mean()

I will write a PR if I can get clarification about those things.

Dec 06 '20 19:12 eldobbins

I agree with the arguments presented by @eldobbins that the example that @mcdperera provided is better than the current one.

print(subset) returns <pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant?

I guess it would be nice to include a sentence or two about DataFrameGroupBy objects, and perhaps a link to the Pandas docs entry on these.

print(subset.groupby('continent').describe()) returned an AttributeError for me

I'm guessing that OP meant subset.describe() instead? Seeing as the object is already grouped by continent.

the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame

subset.mean()

I will write a PR if I can get clarification about those things.

I agree that subset.mean() would be more illustrative than .describe() in this example.

@eldobbins if you are still inclined to write a PR, I will be happy to review it or assist you (except for actually merging, that's up to @alee 😊 )

Best, V

Apr 26 '21 20:04 vinisalazar

I'd be obliged if you could do it. That was a long time ago for me! And you seem to understand what I was getting at.

Liz

Apr 26 '21 21:04 eldobbins

"Group By" - lesson improvements

Group By: split-apply-combine

Conclusions