-
Notifications
You must be signed in to change notification settings - Fork 94
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'TAHIR0110:main' into main
- Loading branch information
Showing
97 changed files
with
146,441 additions
and
0 deletions.
There are no files selected for viewing
1 change: 1 addition & 0 deletions
1
...etection with CNN and EfficientNetB3/blood-cell-cancer-using-cnn-and-efficientnetb3.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
# Feature Importance Analysis using Chi-squared, and ANOVA | ||
|
||
## Introduction | ||
In this analysis, we explore the importance of features in predicting body density. We use Principal Component Analysis (PCA) to understand the variance explained by the features and Chi-squared and ANOVA tests to rank the features based on their importance contribution towards the label. | ||
|
||
## Feature Importance Ranking | ||
We utilized Chi-squared and ANOVA tests to rank the features based on their importance contribution towards predicting body density. | ||
|
||
### Chi-squared Test | ||
- The Chi-squared test is used to determine whether there is a significant association between categorical variables. | ||
- Target variable, 'Density', appears to be continuous, the Chi-squared test might not be applicable to it directly. However, if we discretize the target variable into bins or categories, we can use the Chi-squared test to analyze its relationship with other categorical features in our dataset. | ||
|
||
|
||
### ANOVA Test | ||
- ANOVA is used to compare the means of three or more groups to determine if they are significantly different from each other. It assesses whether there are statistically significant differences among group means. | ||
- ANOVA can be applied to assess whether there are significant differences in the means of our continuous features ('Weight', 'Age', 'Height', etc.) across different categories of our target variable, 'Density'. | ||
|
||
## Conclusion | ||
- the Chi-squared test is suitable for analyzing relationships between categorical variables, while ANOVA is suitable for comparing means across different groups, especially when dealing with continuous and categorical variables. | ||
- we can conclude that 'BodyFat', 'Weight', and 'Abdomen' are the most important features for predicting body density, followed by 'Age', 'Chest', 'Hip', 'Thigh', 'Biceps', 'Knee', 'Neck', and 'Forearm'. 'Ankle', 'Height', and 'Wrist' show weaker associations with the target variable and may have less predictive power in this context |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Body Fat Prediction Dataset | ||
|
||
Welcome to the Body Fat Prediction Dataset! This dataset contains estimates of the percentage of body fat determined by underwater weighing and various body circumference measurements for 252 men. | ||
|
||
- [Body Fat Prediction Dataset](https://www.kaggle.com/datasets/fedesoriano/body-fat-prediction-dataset) | ||
|
||
## Motivation 💡 | ||
|
||
Accurate measurement of body fat is inconvenient and costly. Thus, it's desirable to have easy methods of estimating body fat that are not inconvenient or costly. This dataset can be used to illustrate multiple regression techniques for predicting body fat percentage using easily measurable body circumference measurements. | ||
|
||
## Dataset Details 📊 | ||
|
||
The dataset includes the following variables: | ||
|
||
- Density determined from underwater weighing | ||
- Percent body fat from Siri's equation | ||
- Age (years) | ||
- Weight (lbs) | ||
- Height (inches) | ||
- Various body circumference measurements: Neck, Chest, Abdomen 2, Hip, Thigh, Knee, Ankle, Biceps, Forearm, and Wrist | ||
|
||
These measurements follow specific standards outlined in Behnke and Wilmore (1974). | ||
|
||
## Educational Use 📚 | ||
|
||
This dataset is ideal for educational purposes, particularly for illustrating multiple regression techniques in data analysis and machine learning. By utilizing the provided features, one can develop predictive models to estimate body fat percentage using simple measurement techniques. | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.