problem on prediction stage #113

lordfiftysix · 2022-10-19T07:49:02Z

I am attempting to run the following code:

x_test = pandas.read_csv('test(1).csv')

x_test = text_preprocessor.transform(x_test)

predictions= trainer.predict(X_text=x_test)

where x_test consists of a single column with text descriptions

and I am getting the following output

predict: 75%|███████▌ | 3/4 [00:00<00:00, 12.85it/s]
/usr/local/lib/python3.7/dist-packages/pytorch_widedeep/training/trainer.py in
--> 581 return np.vstack(preds_l).squeeze(1)

ValueError: cannot select an axis to squeeze out which has size not equal to one

I am wondering how to go about resolving this. I have already tried expanding the dims of the x_test and tried resizing it but I am still getting the same issue

The text was updated successfully, but these errors were encountered:

jrzaurin · 2022-10-19T07:51:11Z

hey @lordfiftysix

Could you point me to some code where I can reproduce the error?

I am assuming you have train the trainer and all that right?

lordfiftysix · 2022-10-19T07:52:14Z

yes

jrzaurin · 2022-10-19T07:53:42Z

Then if you could please point me to some code?

Otherwise, maybe I can try later with some dataset I might have and "report back" the results here :)

lordfiftysix · 2022-10-19T07:54:34Z

I suppose I am fine with the second option

jrzaurin · 2022-10-19T14:46:34Z

here is some fully functioning code

from sklearn.model_selection import train_test_split

from pytorch_widedeep import Trainer
from pytorch_widedeep.datasets import load_womens_ecommerce
from pytorch_widedeep.models import BasicRNN, WideDeep
from pytorch_widedeep.preprocessing import TextPreprocessor

if __name__ == "__main__":

    df = load_womens_ecommerce(as_frame=True)

    # to be safe, but one can me more gentle here
    df = df.dropna().reset_index(drop=True)

    # just aesthetics
    df.columns = [c.lower().replace(" ", "_") for c in df.columns]

    # the reviews are a bit imbalanced, so we turned the problem into a binary
    # classification
    df["target"] = (df.rating >= 4).astype("int")
    text_col = "review_text"
    target = "target"

    # train/test split
    train, test = train_test_split(df, test_size=0.2, stratify=df.target)

    # processing
    text_processor = TextPreprocessor(text_col=text_col)
    X_train = text_processor.fit_transform(train)
    X_test = text_processor.transform(test)

    # model definition. The model component needs to be wrap up with the
    # WideDeep class
    basic_rnn = BasicRNN(
        vocab_size=len(text_processor.vocab.itos),
        embed_dim=100,
        hidden_dim=64,
        n_layers=3,
        bidirectional=True,
        rnn_dropout=0.5,
        padding_idx=1,
        head_hidden_dims=[100, 50],
    )
    model = WideDeep(deeptext=basic_rnn, pred_dim=1)

    # Train
    trainer = Trainer(model, objective="binary")
    trainer.fit(
        X_text=X_train,
        target=train[target].values,
        n_epochs=1,
        batch_size=256,
        val_split=0.2,
    )

    # predict
    preds = trainer.predict(X_text=X_test)

lordfiftysix · 2022-10-22T06:50:51Z

It did not work. I am trying to do multi-output regression. Here is some more of my code.

train_df = pd.read_csv('train.csv')
vectorizer = TfidfVectorizer(strip_accents=None,lowercase=False)
#word_vectors_path = "../tmp_data/glove.6B/glove.6B.100d.txt"
text_preprocessor = TextPreprocessor(text_col='full_text')
#print(train_df.head(0))
#tab_preprocessor = TabPreprocessor(['full_text'])
#tab_preprocessor = TabPreprocessor(['full_text'])
#print(x.head(0))
text_id = train_df['text_id']
#train_df = train_df.set_index('text_id')
train_df = train_df.drop(['text_id'],1)
train_df = train_df.dropna().reset_index(drop=True)
x= text_preprocessor.fit_transform(train_df)

#tfidf_vectorizer = TfidfVectorizer()

#x = vectorizer.fit_transform(train_df['full_text'])
#x = pd.DataFrame(x.todense())
#print(x)
#x = tab_preprocessor.fit_transform(x)#

print(x)
print(x.shape)

#print(train_df.head(0))
cols = cols# this is a series of 6 columns that go into the target
#text_id = train_df['text_id']
#y = train_df.drop(['full_text'],1)

#y = y.set_index('text_id')
#tab_preprocessor = TabPreprocessor(cols,shared_embed=False)#, crossed_cols=crossed_cols)
#ywide = tab_preprocessor.fit_transform(y)
#ywidee = y['cohesion']
#tab_preprocesso = TabPreprocessor(['cohesion'])
#ywidee = tab_preprocesso.fit_transform(y)
#print(tab_preprocessor.cat_embed_input)
#print(ywide)
#fast_model = TabMlp(tab_preprocessor.column_idx,tab_preprocessor.cat_embed_input)
#fast_model =  TabFastFormer(tab_preprocessor.column_idx,tab_preprocessor.cat_embed_input)#column_idx=text_id, cat_embed_input=cat_embed_input tab_preprocessor.column_idx,
#rmodel = AttentiveRNN()
rmodel = AttentiveRNN(vocab_size=5741, embed_dim=80) 
#print(y.shape)
#print(tab_preprocessor.cat_embed_input)
target = target
model = WideDeep(
    
    #wide=wide,
    deeptext=rmodel,pred_dim=6
    #deeptabular=fast_model,

) 

wd_trainer = Trainer(
    model=model,
    objective='rmse',#objective="rmse",
    optimizers=torch.optim.AdamW(model.parameters(), lr=0.001),
    #metrics=[Accuracy, Precision]
    #metrics=[Accuracy, Precision],
)

#target = target #where target is a series of 6 columns with numerical int values
#print(ywide)
print(x)
xx=x
print(train_df)
wd_trainer.fit(X_text=x, target=train_df[cols].values, n_epochs=1, batch_size=1, val_split=0.2)

x_test = pd.read_csv('test(1).csv')

x_test = text_preprocessor.transform(x_test)

print(x_test.shape)
x_test = x_test.reshape(80,3)

#x_test = np.expand_dims(x_test,2)
print(x_test.shape)
df_pred = wd_trainer.predict(X_text=x_test)
print(df_pred)

And I am still getting the same error on the prediction stage

jrzaurin · 2022-10-24T08:23:46Z

To do multi-output regression or multi-label classification we would need to modify the code.

In fact I don't know what the rmse value that outputs might be in your case, since the library is designed to work with targets of shape (N, 1), as it is written in the docs: "Losses in this module expect the predictions and ground truth to have the same dimensions for regression and binary classification problems (N_samples, 1) . In the case of multiclass classification problems the ground truth is expected to be a 1D tensor with the corresponding classes."

Anyway, if you can point me towards a notebook/colab with some small dataset or mock data would save me a lot of time. Otherwise I will try to mock some data myself and dig into this later

lordfiftysix · 2022-11-07T04:40:50Z

Hey I wonder if you were ever able to dig into this problem. I can confirm that i have 6 columns and a few thousand rows as my output so RMSE probably wont work. That being said I am struggling to do multi-output regression on these 6 target columns given a single input text column.

jrzaurin · 2022-11-14T11:32:20Z

Hey, sorry @lordfiftysix

I am buried at work these days, sorry for the late reply.

No I did not have the time sorry 🙁.

maybe you could consider this as 6 independent problems? and then combine the losses?

Alternatively, maybe you could code a custom loss yourself? Although this might not be straightforward. See if I get a sec towards the end of the week. Alternatively I will see if @5uperpalo can look into it

@5uperpalo let's have a chat see if we can code a custom loss that takes multiple inputs and produces a single output

jrzaurin closed this as completed Jul 21, 2023

jrzaurin reopened this Jul 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem on prediction stage #113

problem on prediction stage #113

lordfiftysix commented Oct 19, 2022 •

edited

jrzaurin commented Oct 19, 2022 •

edited

lordfiftysix commented Oct 19, 2022

jrzaurin commented Oct 19, 2022 •

edited

lordfiftysix commented Oct 19, 2022

jrzaurin commented Oct 19, 2022

lordfiftysix commented Oct 22, 2022 •

edited

jrzaurin commented Oct 24, 2022

lordfiftysix commented Nov 7, 2022

jrzaurin commented Nov 14, 2022

problem on prediction stage #113

problem on prediction stage #113

Comments

lordfiftysix commented Oct 19, 2022 • edited

jrzaurin commented Oct 19, 2022 • edited

lordfiftysix commented Oct 19, 2022

jrzaurin commented Oct 19, 2022 • edited

lordfiftysix commented Oct 19, 2022

jrzaurin commented Oct 19, 2022

lordfiftysix commented Oct 22, 2022 • edited

jrzaurin commented Oct 24, 2022

lordfiftysix commented Nov 7, 2022

jrzaurin commented Nov 14, 2022

lordfiftysix commented Oct 19, 2022 •

edited

jrzaurin commented Oct 19, 2022 •

edited

jrzaurin commented Oct 19, 2022 •

edited

lordfiftysix commented Oct 22, 2022 •

edited