Data Cleaning - Missing Data
3 Topics
Detecting Missing Data
3 Sub-topics
Detecting Missing Values with isnull() and notnull()
Visualizing Missing Data Patterns
Handling Missing Data
8 Sub-topics
Dropping Rows with Missing Values
Dropping Columns with Missing Values
Filling Missing Values with Scalar
Filling with Mean, Median, Mode
Filling with Interpolation
Replacing Specific Values
Missing Data Strategies
7 Sub-topics
Missing Data Imputation Strategies
Forward and Backward Fill Limitations
Interpolation Methods Comparison
Multivariate Imputation Concepts
Missing Indicator Variables
Dropping vs Imputing Decision Framework
Validating Imputation Results
Data Cleaning - Duplicates and Quality
3 Topics
Handling Duplicates
5 Sub-topics
Keeping First or Last Occurrence
Finding Duplicates in Specific Columns
Handling Duplicates with Custom Logic
Data Validation
8 Sub-topics
Checking for Impossible Values
Ensuring Data Consistency
Creating Validation Rules
Generating Data Quality Reports
Data Quality Frameworks
8 Sub-topics
Defining Data Quality Rules
Implementing Validation Checks
Creating Quality Scorecards
Automated Quality Monitoring
Quality Issue Documentation
Data Transformation Basics
5 Topics
Sorting and Ranking
8 Sub-topics
Sorting by Multiple Columns
Sorting in Ascending and Descending Order
Sorting with Missing Values
Ranking Methods - average, min, max, first, dense
Ranking with Ascending and Descending Order
String Operations
11 Sub-topics
Accessing String Methods with str
Converting Case - lower(), upper(), title()
Finding and Replacing with Regex
Extracting with Regex Patterns
Advanced String Operations
8 Sub-topics
Extracting with Named Groups
Multiple Pattern Matching
Handling Special Characters
String Vectorization Performance
Type Conversion
8 Sub-topics
Converting to Numeric with to_numeric()
Handling Errors in Conversion
Downcasting for Memory Optimization
Converting Multiple Columns at Once
Data Type Optimization
8 Sub-topics
Understanding Memory Layout
Integer Type Optimization
String vs Category Decision
Creating Memory-Efficient Pipelines