"Group By" - lesson improvements
Hello dear maintainers,
I had the chance to teach Python session for beginners. While I was doing, I figure it out the existing “Group By” lesson is bit hard to understand.
So, I modified the section using same data set with different point of view. Here is my example of doing “Group By”.
*************************************** Start ***********************************
Group By: split-apply-combine
-
Any groupby operation involves one of the following operations on the original object. They are −
- Splitting the Object
- Applying a function
- Combining the results
-
Split-apply-combine technique

Source: https://cmdlinetips.com/2018/02/introduction-to-split-apply-combine-with-pandas/
- Used "gapminder_all.csv" csv file.
- Index using continent column.
- Use Pandas "groupby()" method using continent column.
data = pd.read_csv('data/gapminder_all.csv', index_col=["continent"])
subset = data.groupby('continent')
print(subset)
- Above output have duplicate values.
- By using 'count()' method in Pandas we can get the exact record count (number of countries).
print(subset["country"].count())
- Use Pandas describe() method to do statistical analysis based on continents.
print(subset.groupby('continent').describe())
*************************************** End ***********************************
Conclusions
If you think the above mentioned suggestions could help, please do necessary changes to the lesson.
I also have had trouble teaching groupby - mainly because I didn't understand it well myself.
The problems I had with the example that calculates the wealth index:
- I can't see quickly the meaning of the second line. I had an easier time understanding groupby if I could predict what the answer would be
- The numbers output are a jumble (might be better sorted)
- Output is long (screen real-estate)
I like this new example because it solves those problems. I love the figure. However, I don't understand a couple of the outputs.
-
print(subset)returns<pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant? -
print(subset.groupby('continent').describe())returned an AttributeError for me
'DataFrameGroupBy' object has no attribute 'groupby'
- the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame
subset.mean()
I will write a PR if I can get clarification about those things.
I agree with the arguments presented by @eldobbins that the example that @mcdperera provided is better than the current one.
print(subset)returns<pandas.core.groupby.generic.DataFrameGroupBy object at 0x120a86a30>. I don't mind that. It was helpful to me to see that I couldn't look at a groupby object directly. But is that what you meant?
I guess it would be nice to include a sentence or two about DataFrameGroupBy objects, and perhaps a link to the Pandas docs entry on these.
print(subset.groupby('continent').describe())returned an AttributeError for me
I'm guessing that OP meant subset.describe() instead? Seeing as the object is already grouped by continent.
- the output of 'describe' overwhelms me. I would like to change the last line to a command that returns an easily understood DataFrame
subset.mean()
I will write a PR if I can get clarification about those things.
I agree that subset.mean() would be more illustrative than .describe() in this example.
@eldobbins if you are still inclined to write a PR, I will be happy to review it or assist you (except for actually merging, that's up to @alee 😊 )
Best, V
I'd be obliged if you could do it. That was a long time ago for me! And you seem to understand what I was getting at.
Liz