Description
Describe the bug
I encountered an issue with the prediction method when attempting regression using Random Forest and Gradient Boost algorithms. The problem arises specifically in versions higher than 3.0.0. In version 2.6.0, this problem does not occur.
Reproduction Steps
1.Use the provided code snippets for setting up the regression.
2.Attempt to run regression using Random Forest or Gradient Boost with versions > 3.0.0.
3.Observe the error message mentioned below.
Code Snippet
long[] seeds = {
342317953, 521642753, 72070657, 577451521, 266953217, 179976193,
374603777, 527788033, 303395329, 185759582, 261518209, 461300737,
483646580, 532528741, 159827201, 284796929, 655932697, 26390017,
454330473, 867526205, 824623361, 719082324, 334008833, 699933293,
823964929, 155216641, 150210071, 249486337, 713508520, 558398977,
886227770, 74062428, 670528514, 701250241, 363339915, 319216345,
757017601, 459643789, 170213767, 434634241, 414707201, 153100613,
753882113, 546490145, 412517763, 888761089, 628632833, 565587585,
175885057, 594903553, 78450978, 212995578, 710952449, 835852289,
415422977, 832538705, 624345857, 839826433, 260963602, 386066438,
530942946, 261866663, 269735895, 798436064, 379576194, 251582977,
349161809, 179653121, 218870401, 415292417, 86861523, 570214657,
701581299, 805955890, 358025785, 231452966, 584239408, 297276298,
371814913, 159451160, 284126095, 896291329, 496278529, 556314113,
31607297, 726761729, 217004033, 390410146, 70173193, 661580775,
633589889, 389049037, 112099159, 54041089, 80388281, 492196097,
912179201, 699398161, 482080769, 363844609, 286008078, 398098433,
339855361, 189583553, 697670495, 709568513, 98494337, 99107427,
433350529, 266601473, 888120086, 243906049, 414781441, 154685953,
601194298, 292273153, 212413697, 568007473, 666386113, 712261633,
802026964, 783034790, 188095005, 742646355, 550352897, 209421313,
175672961, 242531185, 157584001, 201363231, 760741889, 852924929,
60158977, 774572033, 311159809, 407214966, 804474160, 304456514,
54251009, 504009638, 902115329, 870383757, 487243777, 635554282,
564918017, 636074753, 870308031, 817515521, 494471884, 562424321,
81710593, 476321537, 595107841, 418699893, 315560449, 773617153,
163266399, 274201241, 290857537, 879955457, 801949697, 669025793,
753107969, 424060977, 661877468, 433391617, 222716929, 334154852,
878528257, 253742849, 480885528, 99773953, 913761493, 700407809,
483418083, 487870398, 58433153, 608046337, 475342337, 506376199,
378726401, 306604033, 724646374, 895195218, 523634541, 766543466,
190068097, 718704641, 254519245, 393943681, 796689751, 379497473,
50014340, 489234689, 129556481, 178766593, 142540536, 213594113,
870440184, 277912577};
public static final double[][] x = {
{234.289, 235.6, 159.0, 107.608, 1947, 60.323},
{259.426, 232.5, 145.6, 108.632, 1948, 61.122},
{258.054, 368.2, 161.6, 109.773, 1949, 60.171},
{284.599, 335.1, 165.0, 110.929, 1950, 61.187},
{328.975, 209.9, 309.9, 112.075, 1951, 63.221},
{346.999, 193.2, 359.4, 113.270, 1952, 63.639},
{365.385, 187.0, 354.7, 115.094, 1953, 64.989},
{363.112, 357.8, 335.0, 116.219, 1954, 63.761},
{397.469, 290.4, 304.8, 117.388, 1955, 66.019},
{419.180, 282.2, 285.7, 118.734, 1956, 67.857},
{442.769, 293.6, 279.8, 120.445, 1957, 68.169},
{444.546, 468.1, 263.7, 121.950, 1958, 66.513},
{482.704, 381.3, 255.2, 123.366, 1959, 68.655},
{502.601, 393.1, 251.4, 125.368, 1960, 69.564},
{518.173, 480.6, 257.2, 127.852, 1961, 69.331},
{554.894, 400.7, 282.7, 130.081, 1962, 70.551}
};
public static final double[] y = {
83.0, 88.5, 88.2, 89.5, 96.2, 98.1, 99.0, 100.0, 101.2,
104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9
};
public static DataFrame data = DataFrame.of(x, "GNP", "unemployed", "armed_forces", "population", "year", "employed").merge(DoubleVector.of("deflator", y));
public static Formula formula = Formula.lhs("deflator");
@Test
public void tryOutRandomForest(){
MathEx.setSeed(19650218);
RandomForest model = RandomForest.fit(formula, data, 100, 3, 20, 10, 3, 1.0, Arrays.stream(seeds));
for (int i = 0; i < x.length; i++) {
System.out.println(model.predict(Tuple.of(x[i],model.schema())));
}
}
Expected behavior
The regression should execute prediction successfully without any errors, similar to the behavior observed in version 2.6.0.
Actual behavior
Illegal argument exception is thrown :
Field deflator doesn't exist java.lang.IllegalArgumentException: Field deflator doesn't exist at smile.data.type.StructType.indexOf(StructType.java:103) at smile.data.formula.Variable$1.<init>(Variable.java:80) at smile.data.formula.Variable.bind(Variable.java:78) at smile.data.formula.Formula.bind(Formula.java:360) at smile.data.formula.Formula.x(Formula.java:433) at smile.regression.RandomForest.predict(RandomForest.java:455)
Additional context
- Java version 17
- OS : MacOs
Request for Assistance
Could someone kindly provide insights into what might be causing this error? I'd greatly appreciate any guidance or suggestions for troubleshooting steps.