Skip to content

Latest commit

 

History

History
76 lines (60 loc) · 9.27 KB

README.md

File metadata and controls

76 lines (60 loc) · 9.27 KB

Mastering spaCy Second Edition

no-image

This is the code repository for Mastering spaCy Second Edition, published by Packt.

Build structured NLP solutions with custom components and models powered by spacy-llm

What is this book about?

Master modern NLP development with spaCy's ecosystem: from rapid prototyping with spaCy-LLM to production deployment. Learn to build custom components, integrate transformers, and manage end-to-end workflows with Weasel.

This book covers the following exciting features:

  • Apply transformer models and fine-tune them for specialized NLP tasks
  • Master spaCy core functionalities including data structures and processing pipelines
  • Develop custom pipeline components and semantic extractors for domain-specific needs
  • Build scalable applications by integrating spaCy with FastAPI, Streamlit, and Ray
  • Master advanced spaCy features including coreference resolution and neural pipeline components
  • Train domain-specific models, including NER and coreference resolution
  • Prototype rapidly with spaCy-LLM and develop custom LLM tasks

If you feel this book is for you, get your copy today! https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, chapter_01.

The code will look like the following:

import spacy
nlp = spacy.load("en_core_web_md")
doc = nlp("It's been a crazy week!!!")
print([token.text for token in doc])

To effectively apply your understanding, make sure to execute the code in an appropriate environment with all required libraries and modules installed.

Quick links

No. Chapter Notebook Colab
01 Getting Started with spaCy Chapter 01 Open In Colab
02 Core Operations with spaCy Chapter 02 Open In Colab
03 Extracting Linguistic Features Chapter 03 Open In Colab
04 Mastering Rule-Based Matching Chapter 04 Open In Colab
05 Extracting Semantic Representations with spaCy Pipelines Chapter 05 Open In Colab
06 Utilizing spaCy with Transformers Chapter 06 Open In Colab
07 Enhancing NLP tasks using LLMs with spacy-llm Chapter 07 Open In Colab
08 Training a NER Component with Your Own Data Chapter 08 Open In Colab
09 Creating End-to-End spaCy Workflows with Weasel Chapter 09 Open In Colab
10 Training an Entity Linker Model with spaCy Chapter 10 -
11 Integrating spaCy with Third-Party Libraries Chapter 11 -

Following is what you need for this book: This book is tailored for NLP engineers, machine learning developers, and LLM engineers looking to build production-grade language processing solutions. While primarily targeting professionals working with language models and NLP pipelines, it's also valuable for software engineers transitioning into NLP development. Basic Python programming knowledge and familiarity with NLP concepts is recommended to leverage spaCy's latest capabilities.

With the following software and hardware list you can run all code files present in the book (Chapter 1-11).

Software and Hardware List

Chapter Software required OS required
1-11 Python >= 3.12 Windows, macOS, or Linux
1-11 spaCy v3.7 Windows, macOS, or Linux
1-11 spacy-transformers == 1.3.5 Windows, macOS, or Linux
1-11 spacy-streamlit >= 1.0.6 Windows, macOS, or Linux
1-11 FastAPI >= 0.112.0 Windows, macOS, or Linux

Related products

Get to Know the Authors

Déborah Mesquita is a data science consultant and writer. With a BSc in computer science from UFPE, one of Brazil’s top computer science programs, she brings a diversified skill set refined through hands-on experience with various technologies. Déborah has consistently delivered exceptional results in various data science projects, being able to navigate the business and technical sides of each project. Her ability to translate complex concepts into simple language, coupled with her quick learning and broad vision, make her an effective educator. Actively engaged in community initiatives, she works to ensure equitable access to knowledge, reflecting her belief that technology is not a panacea, but a powerful tool for societal improvement when used for that purpose. She writes a personal blog at deborahmesquita.com.

Duygu Altinok is a senior Natural Language Processing (NLP) engineer with 12 years of experience in almost all areas of NLP, including search engine technology, speech recognition, text analytics, and conversational AI. She has published several publications in the NLP domain at conferences such as LREC and CLNLP. She also enjoys working on open source projects and is a contributor to the spaCy library. Duygu earned her undergraduate degree in computer engineering from METU, Ankara, in 2010 and later earned her master’s degree in mathematics from Bilkent University, Ankara, in 2012. She is currently a senior engineer at German Autolabs with a focus on conversational AI for voice assistants. Originally from Istanbul, Duygu currently resides in Berlin, Germany, with her cute dog Adele.