BookClassifications By ML_Part 2_FeatureEngineering byAutoEncoder

  1. Build a vocabulary for the text data. Transform words into numbers and vice versa. It needs to implement the functions word_to_index and index_to_word.
  2. Process the data in the dataset format that can feed to PyTorch.It needs to implement the functions:__len__, __getitem_ and collate_fn().
  3. Build an encoder module using GRU. It needs to implement forward functions with GRU, Linear, and Activation layers. Its hidden state needs to transfer to the decoder module.
  4. Build a decoder module using GRU. It takes the input which is a time step later than the encoder module input and the encoder output hidden states. It needs to implement forward functions with GRU, Linear, and Activation layers.
  5. Connect the encoder and decoder by the hidden states.
  6. Implement the sequence mask loss function for the model. As the sequence data has padded the same length. It doesn’t need to calculate the loss of the padded.
  7. Train the encoder_decoder model. The model’s input and output are almost the same except the outputs are a one-time step later than the input. It uses teaching forcing here.
  1. Don’t forget to process the data format the same as the training data format when generating the feature. Or the data format processing function can be wrapped up in the predict function in the customized model class.
  2. It needs to import the encoder class definition and vocab class definition to reconstruct the object before using the encoder and the vocab which is loaded from where they are saved.
  3. For the sequence data, it needs to rearrange the batch dimension and the time step dimension, using permute() function here.

--

--

--

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Boosting your Model performance Using XGBoost

Top 9 Algorithms for Machine Learning Beginners

How to train graph convolutional network models in a graph database

Epitomes of Evaluation Metrics

Neural Turing Machines : an “artificial” working memory ?

fast.ai 2020 — Lesson 6

PyTorch basics for beginners

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Slinae Lin

Slinae Lin

More from Medium

Multi-Class Text Classification with Scikit-Learn using TF-IDF model

Preprocessing Transactional / Financial Documents In Transformer Based Pipelines

A Dynamic Web App Using Pre-trained Transformer Models for Sentiment Analysis and Text…

PyTorch and Tensorflow in Natural Language Processing Pipeline_Data Preprocessing