【python 走进NLP】keras情感分析例子

情感分析是自然语言处理很重要的一个方向,目的是让计算机理解文本中包含的情感分析。在这里将通过IMDB收集的对电影评论的数据集,分析某部电影是一部好电影还是一部不好的电影。借此研究情感分析的问题。

1、在这里直接使用keras的imdb.load_data() 函数导入数据。

2、keras通过嵌入层(Embeding)将单词的正整数表示转换为词嵌入。嵌入层需要指定词汇大小预期的最大数量,以及输出的每个词向量的维度。


# -*- coding: utf-8 -*-
from keras.datasets import imdb
import numpy as np
from keras.preprocessing import sequence
from keras.layers.embeddings import Embedding
from keras.layers.convolutional import Conv1D, MaxPooling1D
from keras.layers import Dense, Flatten
from keras.models import Sequential

seed = 7
top_words = 5000
max_words = 500
out_dimension = 32
batch_size = 128
epochs = 10

def create_model():
    model = Sequential()
    # 构建嵌入层
    model.add(Embedding(top_words, out_dimension, input_length=max_words))
    # 1维度卷积层
    model.add(Conv1D(filters=32, kernel_size=3, padding='same', activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(250, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    model.summary()
    return model

if __name__ == '__main__':
    np.random.seed(seed=seed)
    # 导入数据
    (x_train, y_train), (x_validation, y_validation) = imdb.load_data(num_words=top_words)
    # 限定数据集的长度
    x_train = sequence.pad_sequences(x_train, maxlen=max_words)
    x_validation = sequence.pad_sequences(x_validation, maxlen=max_words)

    # 生成模型
    model = create_model()
    model.fit(x_train, y_train, validation_data=(x_validation, y_validation),
              batch_size=batch_size, epochs=epochs, verbose=2)

运行结果:

Using TensorFlow backend.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (None, 500, 32)           160000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 500, 32)           3104      
_________________________________________________________________
max_pooling1d_1 (MaxPooling1 (None, 250, 32)           0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 8000)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 250)               2000250   
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 251       
=================================================================
Total params: 2,163,605
Trainable params: 2,163,605
Non-trainable params: 0
_________________________________________________________________
Train on 25000 samples, validate on 25000 samples
Epoch 1/200
 - 31s - loss: 0.4808 - acc: 0.7374 - val_loss: 0.2800 - val_acc: 0.8843
Epoch 2/200
 - 31s - loss: 0.2234 - acc: 0.9118 - val_loss: 0.2727 - val_acc: 0.8858
Epoch 3/200
 - 33s - loss: 0.1737 - acc: 0.9339 - val_loss: 0.2918 - val_acc: 0.8807
Epoch 4/200
 - 33s - loss: 0.1293 - acc: 0.9540 - val_loss: 0.3168 - val_acc: 0.8777
Epoch 5/200
 - 35s - loss: 0.0841 - acc: 0.9744 - val_loss: 0.3721 - val_acc: 0.8751
Epoch 6/200
 - 33s - loss: 0.0450 - acc: 0.9904 - val_loss: 0.4340 - val_acc: 0.8730
Epoch 7/200
 - 32s - loss: 0.0212 - acc: 0.9966 - val_loss: 0.5029 - val_acc: 0.8703
Epoch 8/200
 - 31s - loss: 0.0085 - acc: 0.9993 - val_loss: 0.5897 - val_acc: 0.8688
Epoch 9/200
 - 31s - loss: 0.0027 - acc: 0.9998 - val_loss: 0.6597 - val_acc: 0.8694
Epoch 10/200
 - 31s - loss: 0.0013 - acc: 0.9999 - val_loss: 0.7108 - val_acc: 0.8697
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页