Skip to content

Build and evaluate logistic regression model using PySpark 3.0.1 library.

License

Notifications You must be signed in to change notification settings

Ansu-John/Logistic-Regression-with-Spark

Repository files navigation

Logistic Regression with Spark

OBJECTIVE

The code shared demonstrates the implementation of Logistic Regression with PySpark.

DATASET USED

Please find the data used, uploaded to github along with the code.

TOOLS

Python, PySpark

TECHNIQUES

Logistic regression is the appropriate regression analysis to conduct when the dependent variable is dichotomous (binary). Like all regression analyses, the logistic regression is a predictive analysis. Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables.

Logistic regression can be binomial, ordinal or multinomial. Binomial or binary logistic regression deals with situations in which the observed outcome for a dependent variable can have only two possible types, "0" and "1" (which may represent, for example, "dead" vs. "alive" or "win" vs. "loss"). Multinomial logistic regression deals with situations where the outcome can have three or more possible types (e.g., "disease A" vs. "disease B" vs. "disease C") that are not ordered. Ordinal logistic regression deals with dependent variables that are ordered.