Byte-Sized Deep Learning Series: Handling Text Data with Keras
- Created By shambhvi
- Posted on May 2nd, 2025
- Overview
- Prerequisites
- Audience
- Curriculum
Description:
Unlock the secrets of working with text data using Keras!
In this 90-minute hands-on session, you'll learn essential text preprocessing steps like tokenization, padding, and vocabulary building, and see how to turn words into numbers using embedding layers.
You'll also explore the powerful TextVectorization layer — and put it all together by building a simple sentiment analysis model!
If you're ready to bring language understanding into your ML skillset, this is the perfect place to start!
Duration: 90 mins
Course Code: BDT495
Learning Objectives:
After this course, you will be able to:
- The challenge of text data in ML/DL
- Text preprocessing: Tokenization, Padding, Vocabulary
- Embedding Layers & Word Embeddings
- Using the Keras TextVectorization Layer
Learners familiar with Python, TensorFlow, and Keras basics (e.g., Sequential models, layers)
Machine learning students and practitioners who understand basic model-building and want to extend their skills to text and NLP tasks. Ideal for those who want to build their first text classification model.
Course Outline:
- The Challenge with Text Data in ML/DL
- Why text is hard: variable length, vocabulary size, semantics
- The goal: turn text into numeric tensors for neural networks
- Text Preprocessing: Tokenization, Padding, Vocabulary
- Tokenization: building text into words or sub words
- Vocabulary building: Assigning unique integers to tokens
- Padding: Making sequences the same length
- Hands-on: Using Tokenizer, pad sequences
- Embedding Layers and Word Embeddings
- Why embeddings? (dense vector representations of words)
- Keras Embedding layer: Turning tokenized input into dense vectors
- Hands-on: Add an embedding layer to a dummy model (inspect output shapes)
- Using the Keras TextVectorization Layer
- Overview: What is TextVectorization?
- Preprocessing text
- How it handles tokenization, vocabulary and sequence length
- Hands-on: Use TextVectorization in the neural network
Training material provided: Yes (Digital format)
Hands-on Lab: Instructions will be provided to install Jupyter notebook and other required python libraries. Students can opt to use ‘Google Colaboratory’ if they do not want to install these tools