Visualize by adding "a bite" to the "boxplot" (boxen / swarm / violin)
      
      
        
        
        
        
Overview
- This is a memo of what I thought about the data visualization of kaggle's titanic.
 
- I want to visualize the distribution of "passenger age" for each "port on board".
 
- In such cases, it is common to use a boxplot (boxplot in seaborn).
 
- On the other hand, ** other ** visualization means can be used to add "a bite", so I summarized them.
 
- This time, as an alternative to boxplot of seaborn I would like to consider the following areas.
 
- boxenplot
 
- swarmplot
 
- violinplot
 
- I hope it helps someone, but it's just a work memo & personal opinion.
 
motivation
Boxplot
- At Titanic, the age of passengers at each port of embarkation looks like this. (First, boxplot)
 
- For the time being, the following can be read.
 
- The median age is around 25 to 30 years old, no matter which port you board from.
 
- There is no big difference in the median and the first and third quantiles. (Queenstown is a little younger?)
 
- Outliers (data for older people) are noticeable for passengers boarding from Southampton
 
If you try Swarmplot
- If you try to make this a Swarmplot, the quartile value will be hard to see, but it will be nice to add a "smell".
 
- You will be able to be aware of the ** number of data for each series. (Actually, ** Queenstown is a small number **)
 
- Easy to read even for those who ** do not know the meaning of boxes and beards **
 
- Easy to read ** dense and sparse parts ** of data
 
Add "a bite" to the boxplot
Try changing functions and options
If you do etc., you can add "a bite"
Putting it all together (cheat sheet)
- boxenplot does not have a split option **
 
- Note that the meaning of ** split option ** is slightly different between swarmplot and violinplot.
 
| option | 
boxenplot | 
swarmplot | 
violinplot | 
| Not specified | 
  | 
  | 
  | 
| hue="Sex" | 
  | 
  | 
  | 
hue="Sex"   split=True | 
None | 
  | 
  | 
"Which" should be used "when"?
- It's hard to say "this is for this purpose!", But ...
 
- If you compare each, you can see the characteristics.
 
Boxplot vs boxenplot
- There is no difference because only 2 letters (en) are changed alphabetically.
Do you want to show it in ** ** quartile ** or in more ** finer quantiles **? Want to be aware of ** outliers **? Is the point
 
 | 
Boxplot(boxplot) | 
boxenplot | 
| display | 
  | 
  | 
| Feature | 
Quartile, Maximum, minimum You can also see the situation of outliers | 
ThanFine quantileCan be seen Hard to see as outliers | 
Boxplot vs swarmplot
- Compared to boxplot, swarmplot that is conscious of individual data and captures it ** continuously **
 
- You can see the ** number, density, and difference ** of the data, but the plot ** cost is high ** and it is difficult for a large amount of data.
 
 | 
Boxplot(boxplot) | 
swarmplot | 
| display | 
  | 
  | 
| Feature | 
section(Quantile)To catch as plotLow cost | 
Awareness of the individualAnd continuously capture the data dataDifferences by number and seriesCan be understood しかし、plotHigh cost | 
swarmplot vs violinplot
- Like swarmplot, violin plot ** handles data continuously ** and plots ** costs can be reduced **
 
- Instead, the number of data and the difference between series ** become unaware. ** **
 
 | 
swarmplot | 
violinplot | 
| display | 
  | 
  | 
| Feature | 
Awareness of the individualAnd continuously capture the data dataDifferences by number and seriesCan be understood But the plotHigh cost | 
Awareness of the individualせず、dataのI can't see the numberBut, Continuous understanding of overall trends plotKeep costs downCan do things. | 
Summary
- There are advantages and disadvantages, and it should be selected according to the application, but in summary, is it as follows?
 
| Interval vs continuous | 
How to add "Hitomi" | 
What visualization method should I choose? | 
Data**section (Quantile)**Treated with | 
OutliersIf you want to be aware of | 
Boxplot(boxplot)  | 
 | 
From the quartileDetailedIn the display, | 
boxenplot  | 
| DataContinuouslyHandle, | 
ThatNumber and densityIf you want to show | 
swarmplot  | 
 | 
Keep plot costs down Overall trendIf you want to show | 
violinplot  |