CV+ML Paper Phylogeny

112,183ํŽธ์˜ ์ปดํ“จํ„ฐ๋น„์ „ยท๋จธ์‹ ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์„ ์ƒ๋ฌผ ๊ณ„ํ†ต๋„์ฒ˜๋Ÿผ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ๋ฌผ
(CVPR / NeurIPS / ICML / ICCV / ICLR / ECCV / 3DV, 1987~2025)
16 Phylum ร— ~120 Class ร— ~400 Order ร— Genus 4-depth ์‹œ๋งจํ‹ฑ ๊ณ„ํ†ต๋„๋กœ ๋ถ„๋ฅ˜ํ•œ EDA ๊ฒฐ๊ณผ.

๐Ÿ“ฆ 112,183 papers ๐Ÿ“… 1987 โ€” 2025 ๐ŸŒณ 4-depth taxonomy ๐Ÿ“ฐ CVPR ยท NeurIPS ยท ICML ยท ICCV ยท ICLR ยท ECCV ยท 3DV
112,183
Papers
38
Years (1987-2025)
16
Phylum
~110
Class
~380
Order
2.9%
Unclassified

๐Ÿงฌ 4-depth ๋ถ„๋ฅ˜ ์šฉ์–ด

์ƒ๋ฌผ ๊ณ„ํ†ต๋„(phylogenetic taxonomy)์—์„œ ๋นŒ๋ ค์˜จ 4๋‹จ๊ณ„. ์œ„์—์„œ ์•„๋ž˜๋กœ ์ ์  ์ข์•„์ง€๋Š” ์นดํ…Œ๊ณ ๋ฆฌ.
L1
Phylum
16๊ฐœ
๋Œ€๋ถ„๋ฅ˜ โ€” ์ปดํ“จํ„ฐ๋น„์ „ยท๋จธ์‹ ๋Ÿฌ๋‹ ์•ˆ์˜ ํฐ ๋ถ„์•ผ ์˜ˆ: Object Detection, Generative Models, 3D Vision
L2
Class
~110๊ฐœ
์ค‘๋ถ„๋ฅ˜ โ€” Phylum ์•ˆ์˜ ์ฃผ์š” ๊ฐˆ๋ž˜ ์˜ˆ: Object Detection โ–ธ Anchor-free Detection
L3
Order
~380๊ฐœ
์†Œ๋ถ„๋ฅ˜ โ€” Class ์•ˆ์˜ ์„ธ๋ถ€ ์ฃผ์ œ ์˜ˆ: Anchor-free Detection โ–ธ Query-based Detection
L4
Genus
๊ฐ€๋ณ€
์„ธ๋ถ€ โ€” Order ์•ˆ์˜ ๊ตฌ์ฒด์  ์ ‘๊ทผ๋ฒ•. top 45 Order์—๋งŒ sub-rule์ด ์žˆ๊ณ  ๋‚˜๋จธ์ง€๋Š” ๋ชจ๋‘ (general) ์˜ˆ: Query-based Detection โ–ธ DETR variants
์™œ ์ƒ๋ฌผ ๊ณ„ํ†ต๋„? ๊ฐ™์€ ์ฃผ์ œ๋„ ๋…ผ๋ฌธ๋งˆ๋‹ค ํ‘œํ˜„์ด ๋‹ค๋ฅด๋‹ค (์˜ˆ: "image segmentation" โ‰ˆ "pixel-wise labeling"). ๋‹จ์ˆœ ํ‚ค์›Œ๋“œ/TF-IDF ๋Œ€์‹  ์‹œ๋งจํ‹ฑ ๋™์˜์–ด ํด๋Ÿฌ์Šคํ„ฐ๋กœ ๋ฌถ๊ณ , ๋ถ„์•ผ โ†’ ๊ฐˆ๋ž˜ โ†’ ์ฃผ์ œ โ†’ ์ ‘๊ทผ๋ฒ•์œผ๋กœ ๋‚ด๋ ค๊ฐ€๋Š” ํŠธ๋ฆฌ์— ๋งคํ•‘. ๊ฒฐ๊ณผ์ ์œผ๋กœ "์–ด๋–ค ๋ถ„์•ผ๊ฐ€ ์–ธ์ œ ๋“ฑ์žฅ/์†Œ๋ฉธํ–ˆ๋Š”์ง€" "ํ•œ ์ฃผ์ œ ์•ˆ์— ์–ด๋–ค ํŒจ๋Ÿฌ๋‹ค์ž„์ด ๊ณต์กดํ•˜๋Š”์ง€"๋ฅผ ํ•œ๋ˆˆ์— ๋ณธ๋‹ค.

Phylum ๋ถ„ํฌ

์ „์ฒด 112,183ํŽธ์„ 16 Phylum + Editorial + Unclassified๋กœ ๋ถ„๋ฅ˜ํ•œ ๊ฒฐ๊ณผ
15. Efficient & Robust ML
11,993 ยท 10.7%
12. Training Strategies
11,404 ยท 10.2%
3. 3D Vision & Reconstruction
10,970 ยท 9.8%
4. Image Recognition & Retrieval
8,090 ยท 7.2%
11. Deep Learning Architecture
7,751 ยท 6.9%
13. Optimization & Learning Theory
6,819 ยท 6.1%
7. Representation Learning
6,619 ยท 5.9%
6. Generative Models & Synthesis
6,488 ยท 5.8%
Other / Unclassified
6,225 ยท 5.5%
5. Video & Motion Understanding
6,215 ยท 5.5%
8. Vision-Language & Multimodal
5,691 ยท 5.1%
14. Reinforcement Learning & Decision Making
5,634 ยท 5.0%
10. Human-centric Vision
4,506 ยท 4.0%
1. Object Detection & Localization
3,859 ยท 3.4%
16. Application Domains
3,655 ยท 3.3%
2. Segmentation
2,967 ยท 2.6%
9. Low-level Vision
2,900 ยท 2.6%
Other / Editorial
397 ยท 0.4%

PLOT 1 Phylum ร— ์—ฐ๋„ stack chart

X = 1987~2025, Y = ์—ฐ๊ฐ„ ๋…ผ๋ฌธ ์ˆ˜ (stacked). ์ƒ‰ = 16๊ฐœ Phylum + Editorial + Unclassified. ํ˜ธ๋ฒ„ ์‹œ ์ •ํ™•ํ•œ ์นด์šดํŠธ ํ‘œ์‹œ.

๐Ÿ“ ๋ณด์ด๋Š” ํŒจํ„ด + ๋ฐ์ดํ„ฐ ์‚ฌ์‹ค
  • ์ดˆ๊ธฐ ์„ฑ์žฅ โ€” ์ฃผ์š” CV/ML ํ•™ํšŒ๊ฐ€ ํ™•์žฅ๋˜๋ฉด์„œ ๊พธ์ค€ํžˆ ์ฆ๊ฐ€
  • 2017~2020๋…„ ํญ๋ฐœ ํ›„ 2021๋…„๋ถ€ํ„ฐ ๊ฐ€์† โ€” ๋”ฅ๋Ÿฌ๋‹ + ํŠธ๋žœ์Šคํฌ๋จธ ์‹œ๋Œ€

PLOT 2 Per-Phylum small multiples

16 Phylum ๊ฐ๊ฐ์— ๋Œ€ํ•ด ํŒจ๋„ 1๊ฐœ. ๊ฐ ํŒจ๋„ ์•ˆ์—์„œ top 8 Class๊ฐ€ stacked area. ํŒจ๋„๋งˆ๋‹ค Phylum ์ƒ‰์œผ๋กœ ์ œ๋ชฉ ํ‘œ์‹œ.

๐Ÿ“ ๋ณด์ด๋Š” ํŒจํ„ด + ๋ฐ์ดํ„ฐ ์‚ฌ์‹ค
  • Object Detection: anchor-based (RCNN era) โ†’ anchor-free โ†’ transformer-based (DETR era)
  • Generative Models: GAN era โ†’ VAE โ†’ 2022๋…„๋ถ€ํ„ฐ Diffusion Models ํญ๋ฐœ

PLOT 3 Class heatmap (์ „์ฒด ~110 Class ร— ์—ฐ๋„)

ํ–‰ = ๋ชจ๋“  Class (Phylum ๊ทธ๋ฃน ์ˆœ์„œ), ์—ด = 3๋…„ bucket. ์ƒ‰ = log10(1 + ๋…ผ๋ฌธ ์ˆ˜). ํ˜ธ๋ฒ„ ์‹œ Class ์ด๋ฆ„ + ์ •ํ™•ํ•œ ์นด์šดํŠธ.

๐Ÿ“ ๋ณด์ด๋Š” ํŒจํ„ด + ๋ฐ์ดํ„ฐ ์‚ฌ์‹ค
  • ์ƒ์‹œ ํ™œ์„ฑ: Image Classification, Object Detection, Semantic Segmentation โ€” ๋ชจ๋“  ์‹œ๋Œ€ ๋นจ๊ฐ•
  • ์ตœ๊ทผ ๋ถ€์ƒ: Vision-Language Models, Diffusion Models, NeRF/3DGS โ€” ์˜ค๋ฅธ์ชฝ ๋๋งŒ ๋นจ๊ฐ•

PLOT 4 Top 12 Class drill-down

๊ฐ€์žฅ ํฐ 12๊ฐœ Class์— ๋Œ€ํ•ด ๊ฐ๊ฐ mini panel. ๋‚ด๋ถ€์—์„œ top 6 Order๊ฐ€ stacked area. ํ˜ธ๋ฒ„๋กœ Order ์ด๋ฆ„ + ์นด์šดํŠธ ํ™•์ธ.

๐Ÿ“ ๋ณด์ด๋Š” ํŒจํ„ด + ๋ฐ์ดํ„ฐ ์‚ฌ์‹ค
  • Object Detection: R-CNN โ†’ YOLO โ†’ DETR ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜
  • Segmentation: semantic โ†’ instance โ†’ panoptic โ†’ SAM (Segment Anything)

๐Ÿฅง Radial Tree (D3.js)

์ƒ๋ฌผ ๊ณ„ํ†ต๋„ ํŒŒ์ด ์ฐจํŠธ. ์‚ฌ์ด๋“œ๋ฐ” ๊ฒ€์ƒ‰์œผ๋กœ ๋‹จ์–ด๊ฐ€ ๋“ค์–ด๊ฐ„ wedge๋“ค์„ ํ•œ๊บผ๋ฒˆ์— ํ•˜์ด๋ผ์ดํŠธ. ํ˜ธ๋ฒ„๋กœ ๊ฐ•์กฐ, ํด๋ฆญ์œผ๋กœ ๊ณ„์—ด ์„ ํƒ ํ›„ โ†โ†’โ†‘โ†“๋กœ ํƒ์ƒ‰.

๐Ÿ“ ์‚ฌ์šฉ๋ฒ• + ์ถ”์ฒœ ํƒ์ƒ‰
  • ๐Ÿ” ๊ฒ€์ƒ‰ โ†’ ๋‹จ์–ด๋ฅผ ์ž…๋ ฅํ•˜๊ฑฐ๋‚˜ ์ถ”์ฒœ ์นฉ(โ†ป๋กœ ๊ฐฑ์‹ )์„ ํด๋ฆญ. ๋งค์นญ๋œ wedge์— ์ฃผํ™ฉ glow + ์‚ฌ์ด๋“œ๋ฐ”์— ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ.
  • ๊ฒฐ๊ณผ ํด๋ฆญ โ†’ ํ•ด๋‹น wedge๋กœ ์ ํ”„, ๊ณ„์—ด(์กฐ์ƒ+ํ›„์†) lineage ํ•˜์ด๋ผ์ดํŠธ.
  • ํ˜ธ๋ฒ„ โ†’ wedge ๋‘๊บผ์›Œ์ง + ์‚ฌ์ด๋“œ๋ฐ” HOVER ์นด๋“œ.
  • โ†/โ†’ = ํ˜•์ œ, โ†‘ = ๋ถ€๋ชจ, โ†“ = ์ฒซ ์ž์‹ (๊ฐ€์žฅ ํฐ child๋ถ€ํ„ฐ)

๐ŸŒณ Horizontal Collapsible Tree (D3.js)

์™ผ์ชฝ root์—์„œ ์˜ค๋ฅธ์ชฝ์œผ๋กœ ๊ฐ€์ง€์น˜๋Š” ์ „ํ˜•์  phylogenetic tree. ๋…ธ๋“œ ๋˜๋Š” ๋ผ๋ฒจ ํด๋ฆญ์œผ๋กœ ํŽผ์น˜๊ธฐ/์ ‘๊ธฐ. ๊ธฐ๋ณธ์€ Phylum๊นŒ์ง€๋งŒ ํŽผ์ณ์ ธ ์žˆ๊ณ  ๋” ๊นŠ์ด ๋“ค์–ด๊ฐ€๋ ค๋ฉด ํด๋ฆญํ•˜๊ฑฐ๋‚˜ ์‚ฌ์ด๋“œ๋ฐ” ๋ฒ„ํŠผ ์‚ฌ์šฉ.

๐Ÿ“ ์‚ฌ์šฉ๋ฒ•
  • ๋…ธ๋“œ/๋ผ๋ฒจ ํด๋ฆญ โ†’ ์ž์‹ ํŽผ์น˜๊ธฐ/์ ‘๊ธฐ + focus ์„ ํƒ (ํ•ด๋‹น lineage ๊ฐ•์กฐ, ๋‚˜๋จธ์ง€ dim)
  • ํ˜ธ๋ฒ„ โ†’ ์‚ฌ์ด๋“œ๋ฐ” HOVER ์นด๋“œ์— ์ •๋ณด ํ‘œ์‹œ (์‹ค์‹œ๊ฐ„)
  • โ†‘/โ†“ = ๊ฐ™์€ ๋ถ€๋ชจ์˜ ํ˜•์ œ ๋…ธ๋“œ๋กœ ์ด๋™, โ† = ์ ‘๊ธฐ, โ†’ = ํŽผ์น˜๊ธฐ, Esc = ํ•ด์ œ
  • ์‚ฌ์ด๋“œ๋ฐ” ๋ฒ„ํŠผ: L2๊นŒ์ง€ = ๋ชจ๋“  Class ํŽผ์นจ, L3๊นŒ์ง€ = Order๊นŒ์ง€, ์ „๋ถ€ ์ ‘๊ธฐ = Phylum๋งŒ
  • ํŒŒ๋ž€ ์™ธ๊ณฝ ๋™๊ทธ๋ผ๋ฏธ๊ฐ€ ์žˆ๋Š” ๋…ธ๋“œ = ์ ‘ํžŒ ์ž์‹์ด ์žˆ์Œ์„ ์˜๋ฏธ

๐Ÿ“Š #1 Phylum๋ณ„ ์ถœํ˜„ ยท ํ”ผํฌ ยท 5๋…„ ์„ฑ์žฅ๋ฅ 

์ „์ฒด 112,183ํŽธ ์ค‘ 16 Phylum ๊ฐ๊ฐ์˜ ์ฒซ ๋“ฑ์žฅ ์—ฐ๋„, ํ”ผํฌ ์—ฐ๋„/๋…ผ๋ฌธ ์ˆ˜, ๊ทธ๋ฆฌ๊ณ  2016-20 โ†’ 2021-25 5๋…„ ์„ฑ์žฅ๋ฅ .

PhylumTotalFirstPeak2016-202021-25Growth
1. Object Detection & Localization3,85919872024 (416)9361,695+81.1%
2. Segmentation2,96719882024 (383)6751,414+109.5%
3. 3D Vision & Reconstruction10,97019872024 (1,567)2,0315,405+166.1%
4. Image Recognition & Retrieval8,09019872024 (636)1,6312,569+57.5%
5. Video & Motion Understanding6,21519872024 (603)1,4402,557+77.6%
6. Generative Models & Synthesis6,48819882024 (1,683)1,2754,928+286.5%
7. Representation Learning6,61919872023 (1,002)1,4694,097+178.9%
8. Vision-Language & Multimodal5,69119882024 (1,901)5295,014+847.8%
9. Low-level Vision2,90019882024 (358)7841,515+93.2%
10. Human-centric Vision4,50619872024 (394)1,3561,708+26.0%
11. Deep Learning Architecture7,75119872024 (1,135)2,2814,588+101.1%
12. Training Strategies11,40419872024 (1,472)2,5435,678+123.3%
13. Optimization & Learning Theory6,81919872024 (810)1,7463,257+86.5%
14. Reinforcement Learning & Decision Making5,63419872024 (873)1,2713,543+178.8%
15. Efficient & Robust ML11,99319872024 (1,666)2,5966,553+152.4%
16. Application Domains3,65519872024 (631)6932,186+215.4%
ํ•ต์‹ฌ ์‚ฌ์‹ค: Vision-Language & Multimodal์ด +848% ์„ฑ์žฅ์œผ๋กœ ์••๋„์  1์œ„ โ€” CLIP, BLIP, LLaVA ๋“ฑ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM ๋ฌผ๊ฒฐ์˜ ๊ฒฐ๊ณผ. ๊ทธ ๋‹ค์Œ์œผ๋กœ Generative Models (+287%), Application Domains (+215%). 16 Phylum ๋ชจ๋‘ ์–‘์˜ ์„ฑ์žฅ โ€” CV+ML ์ „์ฒด๊ฐ€ ๊ฐ€์† ์ค‘์ด์ง€๋งŒ ๊ฐ€์†๋„๊ฐ€ ๋‹ค๋ฆ„.

๐Ÿ“Š #2 2020-2025 ๊ฐ€์žฅ ํ•ซํ•œ Class TOP 10

์ตœ๊ทผ 5๋…„๊ฐ„ ๋…ผ๋ฌธ ์ˆ˜ ๊ธฐ์ค€ Top 10 Class.

#PhylumClassPapers
112. Training StrategiesTraining Techniques4,233
214. Reinforcement Learning & Decision MakingReinforcement Learning3,621
38. Vision-Language & MultimodalLanguage Model Applications3,016
413. Optimization & Learning TheoryOptimization Theory2,850
53. 3D Vision & Reconstruction3D Scene Understanding2,518
611. Deep Learning ArchitectureGeneral Deep Learning2,125
715. Efficient & Robust MLBayesian & Probabilistic Methods1,989
86. Generative Models & SynthesisDiffusion Models1,893
93. 3D Vision & ReconstructionNeural Implicit Representations1,459
102. SegmentationImage Segmentation1,424
ํ•ต์‹ฌ ์‚ฌ์‹ค: Diffusion Models ํ•œ Class๋งŒ์œผ๋กœ 5๋…„๊ฐ„ 1,893ํŽธ โ€” 2020๋…„ ์ด์ „์—” ๊ฑฐ์˜ ์กด์žฌํ•˜์ง€ ์•Š๋˜ Class. Neural Implicit Reps(#9, NeRF/3DGS ๊ณ„์—ด)์™€ Language Model Applications(#3)๊นŒ์ง€ ํ•ฉ์น˜๋ฉด Top 10 ์ค‘ 3๊ฐœ๊ฐ€ 2019๋…„ ์ดํ›„ ํƒ„์ƒํ•œ ํŒจ๋Ÿฌ๋‹ค์ž„.

๐Ÿ“Š #3 ์‚ฌ๋ผ์ง„ ๋ถ„์•ผ (Pre-2015 โ‰ฅ 20ํŽธ โ†’ Post-2020 โ‰ค 10%)

ํ•œ๋•Œ ํ™œ๋ฐœํ–ˆ์ง€๋งŒ ์ตœ๊ทผ ๊ฑฐ์˜ ์‚ฌ๋ผ์ง„ Class.

Phylum > ClassPre-2015Post-2020Retain
๊ธฐ์ค€(Pre-2015 โ‰ฅ 20ํŽธ & Post-2020 โ‰ค 10%)์„ ๋งŒ์กฑํ•˜๋Š” Class ์—†์Œ โ€” Class ๋‹จ์œ„์—์„œ๋Š” 2015๋…„ ์ด์ „ ํ™œ๋ฐœํ–ˆ๋˜ ๋ชจ๋“  CV/ML ์˜์—ญ์ด ์ง€๊ธˆ๋„ ์˜๋ฏธ ์žˆ๋Š” ์ถœํŒ๋Ÿ‰์„ ์œ ์ง€.
ํ•ต์‹ฌ ์‚ฌ์‹ค: Class ๋‹จ์œ„์—์„œ๋Š” ์‚ฌ๋ผ์ง„ ๋ถ„์•ผ๊ฐ€ ์—†์Œ. ๋กœ๋ณดํ‹ฑ์Šค(Visual Servoing ๋“ฑ)์™€ ๋‹ฌ๋ฆฌ, CV/ML์—์„œ 2015๋…„ ์ด์ „ ํ™œ๋ฐœํ–ˆ๋˜ Class๋“ค โ€” Image Classification, Optical Flow, Stereo, Face Recognition ๋“ฑ โ€” ์€ ์ „๋ถ€ ์ง€๊ธˆ๋„ ํ™œ๋ฐœ. ์‡ ํ‡ด๋Š” ํ•œ ๋‹จ๊ณ„ ์•„๋ž˜(Order/Genus)์—์„œ ๋ฐœ์ƒ: ์˜ˆ) Object Detection ์•ˆ์˜ anchor-based ๋ฐฉ๋ฒ•๋ก , Image Recognition ์•ˆ์˜ ์†๊ณตํ•™ ๋””์Šคํฌ๋ฆฝํ„ฐ ๋“ฑ์ด ๋”ฅ๋Ÿฌ๋‹ ํ›„์† ๊ธฐ๋ฒ•์— ํก์ˆ˜๋จ. Order ๋‹จ์œ„ ๋ณ€ํ™”๋Š” heatmap์—์„œ ํ™•์ธ ๊ฐ€๋Šฅ.

๐Ÿ“Š #4 ์‹ ์ƒ ์นดํ…Œ๊ณ ๋ฆฌ ์ฒซ ๋“ฑ์žฅ ์—ฐ๋„

ํ‚ค์›Œ๋“œ ๋งค์นญ์œผ๋กœ ๊ฐ ํ•ซ ์นดํ…Œ๊ณ ๋ฆฌ์˜ first paper year + ๋ˆ„์  ์นด์šดํŠธ.

CategoryFirst yearFirst paperTotal2021-25
Diffusion Models2020Improved Techniques for Training Score-Based Generative Models (NeurIPS)256253
Vision Transformer (ViT)2021Manipulation Detection in Satellite Images Using Vision Transformer (CVPR)497497
DETR / transformer detection2021UP-DETR: Unsupervised Pre-Training for Object Detection With Transformers (CVPR)4848
NeRF / Neural Radiance Field2020NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ECCV)358357
3D Gaussian Splatting2024Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks (CVPR)290290
Vision-Language Models (CLIP)2021VinVL: Revisiting Visual Representations in Vision-Language Models (CVPR)599599
Segment Anything (SAM)2023Segment Anything (ICCV)7070
Foundation Models2022FETA: Towards Specializing Foundational Models for Expert Task Applications (NeurIPS)381381
Self-Supervised (modern)2021Masked autoencoder / SimCLR / MoCo / DINOv2 family135135
Mamba / State Space2024Vision Mamba / VMamba family6363
LoRA / PEFT2022Low-Rank Adaptation / Parameter-Efficient Fine-tuning family102102
ControlNet / Diffusion Edit2023InstructPix2Pix: Learning to Follow Image Editing Instructions (CVPR)2323
ํ•ต์‹ฌ ์‚ฌ์‹ค: 2020-2021๋…„์ด modern CV+ML์˜ ๋ถ„๊ธฐ์  โ€” NeRF, ViT, CLIP, modern Diffusion ๋ชจ๋‘ 18๊ฐœ์›” ์•ˆ์— ๋™์‹œ ๋“ฑ์žฅ. 2022-23๋…„ ๋‘ ๋ฒˆ์งธ ๋ฌผ๊ฒฐ(SAM, ControlNet, Foundation Models), 2024๋…„ 3D Gaussian Splatting + Vision Mamba๊ฐ€ ํ•ฉ๋ฅ˜. (SIGGRAPH ์ œ์™ธ ๋ฐ์ดํ„ฐ์…‹์ด๋ผ 3DGS๋Š” ์‹ค์ œ ์ฒซ ๋“ฑ์žฅ(2023 SIGGRAPH)๋ณด๋‹ค 1๋…„ ๋Šฆ๊ฒŒ ๋ณด์ž„.)