【python 走进NLP】hanNLP 简繁拼音转换

汉语少不了面对简体繁体、汉字拼音的相互转换。HanLP基于双数组Trie树AC自动机算法提供了性能极高的转换功能。
下面我写了三个函数包括了这三个功能,避免以后重复造轮子。

官方网站
github:https://github.com/hankcs/HanLP
java:http://hanlp.linrunsoft.com/doc/_build/html/util.html

优点:

  • HanLP不仅支持基础的汉字转拼音,还支持声母、韵母、音调、音标和输入法首字母首声母功能。
  • HanLP能够识别多音字,也能给繁体中文注拼音。
  • 最重要的是,HanLP采用的模式匹配升级到 AhoCorasickDoubleArrayTrie,性能大幅提升,能够提供毫秒级的响应速度!

安装库:

pip install pyhanlp 
# -*- encoding=utf-8 -*-


from  pyhanlp import *


# 中文转拼音
def chiness2pinyin(text):
    """
    :param text: 中文
    :return: 中文转拼音
    """
    pinyinList=HanLP.convertToPinyinList(text)
    pin=[]
    for each in pinyinList:
        pin.append(each.getPinyinWithoutTone())

    res=' '.join(pin)
    res=str(res).replace('none',' ')

    return res


# 繁体转简体
def convertToSimplifiedChinese(text):
    """
    :param text: 繁体文字
    :return: 繁体转简体
    """
    return HanLP.convertToSimplifiedChinese(text)


# 简体转繁体

def convertToTraditionalChinese(text):
    """
    :param text: 简体文字
    :return: 简体转繁体
    """

    return HanLP.convertToTraditionalChinese(text)



if __name__ == '__main__':
    text="你知道你和星星的区别么,星星在天上,你在我心里,我爱你中国!"

    print(chiness2pinyin(text))
    print(convertToSimplifiedChinese(text))
    print(convertToTraditionalChinese(text))

运行结果:

ni zhi dao ni he xing xing de qu bie me   xing xing zai tian shang   ni zai wo xin li   wo ai ni zhong guo  
你知道你和星星的区别么,星星在天上,你在我心里,我爱你中国!
你知道你和星星的區別麼,星星在天上,你在我心裏,我愛你中國!

Process finished with exit code 0

java用法:

import com.hankcs.hanlp.HanLP;
import com.hankcs.hanlp.dictionary.py.Pinyin;
import java.util.List;

public class hanNLP3 {
    public static void main(String[] args) {

        /*拼音转换*/

        String text = "重载不是重任";
        List<Pinyin> pinyinList = HanLP.convertToPinyinList(text);
        System.out.print("原文,");
        for (char c : text.toCharArray()) {
            System.out.printf("%c,", c);
        }
        System.out.println();

        System.out.print("拼音(数字音调),");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin);
        }
        System.out.println();

        System.out.print("拼音(符号音调),");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getPinyinWithToneMark());
        }
        System.out.println();

        System.out.print("拼音(无音调),");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getPinyinWithoutTone());
        }
        System.out.println();

        System.out.print("声调,");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getTone());
        }
        System.out.println();

        System.out.print("声母,");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getShengmu());
        }
        System.out.println();

        System.out.print("韵母,");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getYunmu());
        }
        System.out.println();

        System.out.print("输入法头,");
        for (Pinyin pinyin : pinyinList) {
            System.out.printf("%s,", pinyin.getHead());
        }
        System.out.println();


        /*  简繁转换*/

        System.out.println(HanLP.convertToTraditionalChinese("用笔记本电脑写程序"));
        System.out.println(HanLP.convertToSimplifiedChinese("「以後等妳當上皇后,就能買士多啤梨慶祝了」"));

    }
}

运行结果:

原文,重,载,不,是,重,任,
拼音(数字音调),chong2,zai3,bu2,shi4,zhong4,ren4,
拼音(符号音调),chóng,zǎi,bú,shì,zhòng,rèn,
拼音(无音调),chong,zai,bu,shi,zhong,ren,
声调,2,3,2,4,4,4,
声母,ch,z,b,sh,zh,r,
韵母,ong,ai,u,i,ong,en,
输入法头,ch,z,b,sh,zh,r,
用筆記本電腦寫程序
“以后等你当上皇后,就能买草莓庆祝了”

Process finished with exit code 0
已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页