Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running enamplecode on Apple Silicon (M1) #44

Open
sjachille opened this issue Dec 27, 2021 · 1 comment · Fixed by AAAI-DISIM-UnivAQ/nlpia#1 · May be fixed by #45
Open

Problems running enamplecode on Apple Silicon (M1) #44

sjachille opened this issue Dec 27, 2021 · 1 comment · Fixed by AAAI-DISIM-UnivAQ/nlpia#1 · May be fixed by #45

Comments

@sjachille
Copy link

sjachille commented Dec 27, 2021

Problem related to the use of nlpia on the new Apple Silicon M1 environment, using conda - installation is successful but there are many files that do not work

@sjachille
Copy link
Author

Hello,
I have followed all the procedures I have been able to identify to install nlpia. While the installation is successful, there are many examples which do not work - see attached screenshots. Thank you for any help/suggestions you might have.

Code for example 4.1.5 follows so you can understand and hopefully help. The code is on Page 108, Ch. 4:

import pandas as pd
from nlpia.data.loaders import get_data
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize.casual import casual_tokenize
from sklearn.preprocessing import MinMaxScaler # See P.110
from nlpia.book.examples.ch04_catdog_lsa_3x6x16 import word_topic_vectors # from page 114

pd.options.display.width = 120
sms = get_data('sms-spam')
index = ['sms{}{}'.format(i, '!'*j) for (i, j) in zip(range(len(sms)), sms.spam)]

sms = pd.DataFrame(sms.values, columns=sms.columns, index=index)
sms['spam'] = sms.spam.astype(int)

print(len(sms))

print(sms.head(6))

P. 108 - Now let's do our tokenization and TF-IDF vector transformation on all these SMS messages:

tfidf_model = TfidfVectorizer(tokenizer=casual_tokenize)
tfidf_docs = tfidf_model.fit_transform(raw_documents=sms.text).toarray()

print(tfidf_docs.shape, sms.spam.sum())

P. 109

mask = sms.spam.astype(bool).values
spam_centeroid = tfidf_docs[mask].mean(axis=0)
ham_centeroid = tfidf_docs[~mask].mean(axis=0)

print(spam_centeroid.round(2))
print(ham_centeroid.round(2))

spamminess_score = tfidf_docs.dot(spam_centeroid - ham_centeroid)
print(spamminess_score.round(2))

P 110

sms['lda_score'] = MinMaxScaler().fit_transform(spamminess_score.reshape(-1, 1))
sms['lda_predict'] = (sms.lda_score > 0.5).astype(int)

print(sms['spam lda_predict lda_score'.split()].round(2).head(6))

Page 114 Listing 4.2

print(word_topic_vectors.T.round(1))

Pasted_Image_27_12_21__16_27
Pasted_Image_27_12_21__16_30

@sjachille sjachille changed the title Problems running enamplecode on Apple Silicon (M!) Problems running enamplecode on Apple Silicon (M1) Dec 27, 2021
giodegas added a commit to AAAI-DISIM-UnivAQ/nlpia that referenced this issue Dec 31, 2021
Working with python3.9.5 and tensorflow-metal optimized for the M1 arm architecture, including the integrated GPU.

Fix totalgood#44 .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant