Merge pull request #75 from JuliaAI/dev

For a 0.1.3 release
JuliaAI · Jan 6, 2024 · f631a95 · f631a95
2 parents b03a642 + 60b4461
commit f631a95
Show file tree

Hide file tree

Showing 6 changed files with 24 additions and 10 deletions.
diff --git a/Project.toml b/Project.toml
@@ -1,7 +1,8 @@
 name = "Imbalance"
 uuid = "c709b415-507b-45b7-9a3d-1767c89fde68"
 authors = ["Essam Wisam <[email protected]>", "Anthony Blaom <[email protected]> and contributors"]
-version = "0.1.2"
+version = "0.1.3"
+
 
 [deps]
 CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
@@ -24,6 +25,9 @@ Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
 TransformsBase = "28dd2a49-a57a-4bfb-84ca-1a49db9b96b8"
 
 [compat]
+LinearAlgebra="1.6"
+Random="1.6"
+Statistics="1.6"
 CategoricalArrays = "0.10"
 CategoricalDistributions = "0.1"
 Clustering = "0.15"

diff --git a/README.md b/README.md
@@ -76,7 +76,7 @@ Xover, yover = transform(mach, X, y)
 All implemented oversampling methods are considered static transforms and hence, no `fit` is required. 
 
 #### Pipelining Models
-If `MLJBalancing` is also used, an arbitrary number of resampling methods from `Imbalance.jl` can be wrapped with a classification model from `MLJ` to function as a unified model where resampling automatically takes place on given data before training the model (and is bypassed during prediction).
+If [MLJBalancing](https://github.com/JuliaAI/MLJBalancing.jl) is also used, an arbitrary number of resampling methods from `Imbalance.jl` can be wrapped with a classification model from `MLJ` to function as a unified model where resampling automatically takes place on given data before training the model (and is bypassed during prediction).
 
 ```julia
 using MLJBalancing
@@ -147,4 +147,4 @@ One obvious possible remedy is to weight the smaller sums so that a learning alg
 To our knowledge, there are no existing maintained Julia packages that implement resampling algorithms for multi-class classification problems or that handle both nominal and continuous features. This has served as a primary motivation for the creation of this package.
 
 ## 👥 Credits
-This package was created by [Essam Wisam](https://github.com/JuliaAI) as a Google Summer of Code project, under the mentorship of [Anthony Blaom](https://ablaom.github.io). Special thanks also go to [Rik Huijzer](https://github.com/rikhuijzer) for his friendliness and the binary `SMOTE` implementation in `Resample.jl`.
+This package was created by [Essam Wisam](https://github.com/JuliaAI) as a Google Summer of Code project, under the mentorship of [Anthony Blaom](https://ablaom.github.io). Special thanks also go to [Rik Huijzer](https://github.com/rikhuijzer) for his friendliness and the binary `SMOTE` implementation in `Resample.jl`.
diff --git a/docs/src/algorithms/implementation_notes.md b/docs/src/algorithms/implementation_notes.md
@@ -5,4 +5,6 @@ Papers often propose the resampling algorithm for the case of binary classificat
 ### Generalizing to Real Ratios
 Papers often proposes the resampling algorithm using integer ratios. For instance, a ratio of `2` would mean to double the amount of data in a class and a ratio of $2.2$ is not allowed or will be rounded. In `Imbalance.jl` any appropriate real ratio can be used and the ratio is relative to the size of the majority or minority class depending on whether the algorithm is oversampling or undersampling. The generalization occurs by randomly choosing points instead of looping on each point. That is, if a $2.2$ ratio corresponds to $227$ examples then $227$ examples are chosen randomly by replacement then applying resampling logic to each. Given an integer ratio $k$, this falls back to be on average equivalent to looping on the points $k$ times.
 
-[1] López, V., Fernández, A., Moreno-Torres, J.G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585-6608.
+[1] Fernández, A., López, V., Galar, M., Del Jesus, M. J., and Herrera, F. (2013). Analysing the classifi-
+cation of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches.
+Knowledge-Based Systems, 42:97–110.
diff --git a/docs/src/contributing.md b/docs/src/contributing.md
@@ -24,7 +24,13 @@ Any method resampling method implemented in the `oversampling_methods` or `under
 │   └── resample_method.jl   # implements the method itself (pure functional interface)
 ```
 
-# Adding New Resampling Methods
+# Contribution
+
+
+## Reporting Problems or Seeking Support
+- Do not hesitate to post a Github issue with your question or problem.
+
+## Adding New Resampling Methods
 - Make a new folder `resample_method` for the method in the `oversampling_methods` or `undersampling_methods`
 - Implement in `resample_method/resample_method.jl` the method over matrices for one minority class
 - Use `generic_oversample.jl` to generalize it to work on the whole data
@@ -42,10 +48,13 @@ Surely, you can ignore ignore the third step if the algorithm you are implementi
 - `BorderlineSMOTE2`: A small modification of the `BorderlineSMOTE1` condition
 - `RepeatedENNUndersampler`: Simply repeats `ENNUndersampler` multiple times
 
-# Adding New Tutorials
+
+## Adding New Tutorials
 - Make a new notebook with the tutorial in the `examples` folder found in `docs/src/examples`
 - Run the notebook so that the output is shown below each cell
 - If the notebook produces visuals then save and load them in the notebook
 - Convert it to markdown by using Python to run `from convert import convert_to_md; convert_to_md('<filename>')`
 - Set a title, description, image and links for it in the dictionary found in `docs/examples.jl`
-- For the colab link, you do not need to upload anything just follow the link pattern in the file
+- For the colab link, you do not need to upload anything just follow the link pattern in the file
+
+
diff --git a/docs/src/examples/Colab.md b/docs/src/examples/Colab.md
@@ -1,6 +1,6 @@
 # Google Colab
 
-It is possible to run tutorials found in the examples section or API documentation on Google colab. It should be evident how so by launching the notebook. This section describes what happens under the hood.
+It is possible to run tutorials found in the examples section or API documentation on Google Colab (using provided link or icon). It should be evident how so by launching the notebook. This section describes what happens under the hood.
 
 - The first cell runs the following bash script to install Julia:
 

diff --git a/src/common/utils.jl b/src/common/utils.jl
@@ -35,7 +35,6 @@ where that value occurs.
 """
 function group_inds(categorical_array::AbstractVector{T}) where {T}
     result = LittleDict{T,AbstractVector{Int}}()
-    freeze(result)
     for (i, v) in enumerate(categorical_array)
         # Make a new entry in the dict if it doesn't exist
         if !haskey(result, v)
@@ -44,6 +43,6 @@ function group_inds(categorical_array::AbstractVector{T}) where {T}
         # It exists, so push the index belonging to the class
         push!(result[v], i)
     end
-    return result
+    return freeze(result)
 end