意昂体育 -【注册福利】全站活动，超值福利等你拿！

ENGLISH

教師簡介

胡海長聘教軌助理教授

部門👍🏿🤵🏼：翻譯系

主要經歷

個人簡介

意昂体育助理教授。2021年獲美國印第安納大學計算語言學博士學位（輔修認知科學）👏🏻。獲中國人民大學英語語言文學本科、碩士學位。研究方向為：計算語言學、自然語言處理👨‍🍼、大語言模型、認知科學。在Computational Linguistics等計算語言學🚴、語言學權威期刊發表論文多篇，在ACL, AAAI, EMNLP, COLING等自然語言處理及人工智能頂會發表論文多篇。主持教育部人文社科青年項目、上海市浦江人才計劃項目☹️。獲2024年中國計算語言學年會亮點論文獎。

You can reach me at hu.hai [shift+2] sjtu.edu.cn

個人主頁：Personal webpage

實驗室 CL Lab

Lab webpage: Computational Linguistics (CL) lab

Our lab is equipped with several deep learning servers for training and evaluating state-of-the-art language models. Currently we have two master students and several undergraduate students working in the lab. They come from diverse backgrounds: linguistics, computer science, information engineering, translation, language studies (English, German, Chinese), etc.

We are actively recruiting students interested in computational linguistics and related areas!

歡迎對計算語言學/自然語言處理/語料庫語言學/機器翻譯/認知科學感興趣的本科同學加入本組🦆！特別歡迎有計算機或心理學、認知科學背景的同學😰。
歡迎有意讀碩士（語言學學碩、翻譯專碩）的同學聯系我！

教學科研

科研項目及獲獎 Funding and awards👳🏻：

主持意昂体育注册文科科研創新培育項目（2023-）
獲上海市浦江人才計劃支持（2022）Shanghai Pujiang Program
主持教育部人文社科青年項目 Ministry of Education Funding（2022-）
2024年中國計算語言學會議（CCL）亮點論文獎 CCL 2024 Highlight Paper Award

數據集及系統 Datasets and systems

歡迎使用我們開發的大語言模型訓練及評測數據以及ChatGPT英語作文檢測器。

ArguGPT detector: (2023) ChatGPT英語作文檢測器👵🏿🫲🏽：預測英語議論文由ChatGPT生成的概率（huggingface鏈接）
【new!】ZhoBLiMP: a Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese 包含15種大語言現象♐️、118種小語言現象的漢語最小對立體（minimal pair）；以及20個從頭預訓練的漢語大模型（參數量：14M to 1.4B）
MELA：(2024) Multilingual Evaluation of Linguistic Acceptability 多語句法可接受度數據集（10種語言：英、中、俄、意⛔、德、西🤳🏽、日、法👨‍🦯、阿、冰島）
CoLAC💇🏿‍♂️：(2023) Corpus of Linguistic Acceptability in Chinese 漢語句法可接受度數據集
SwordsmanImp🚽：(2024) A benchmark for pragmatic understanding in Chinese based a sitcom《武林外傳》言外之意數據集
Cured SICK: (2023) Re-annotated SICK dataset; 重新標註的SICK數據集
ChineseNLIProbing: (2021) Multiple probing datasets for Chinese NLI, including Chinese HANS, expanded diagnostics, etc. 多個漢語自然語言推理評測
OCNLI： (2020) Original Chinese Natural Language Inference; 原生漢語自然語言推理數據集
CLUE: (2020) Chinese Language Understanding Evaluation (CLUE) benchmark; 中文語言理解測評基準
FewCLUE: (2021) Few-shot CLUE Benchmark; CLUE少樣本學習評測

教授課程 Courses

意昂体育注册 SJTU：大語言模型原理及應用入門 Introduction to Large Language Models、語言智能 Language Intelligence、學術英語寫作 Academic Writing、大學英語 College English🦵🏻、英語視聽說 English Viewing, Listening and Speaking

印第安納大學 Indiana University👙：語言學入門 Introduction to Linguistics、認知科學中的邏輯與數學（助教）Math and Logic in Cognitive Science (TA)

論文發表 Publications

# denotes corresponding author; * denotes equal contribution

Preprints

Liu, Y., Shen, Y., Zhu, H., Xu, L., Qian, Z., Song, S., Zhang, K., Tang, J., Zhang, P., Yang, B., Wang, R., & Hu, H#. (2024). ZhoBLiMP: a Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese. paper. data.
Liu, Y., Zhang, Z., Zhang, W., Yue, S., Zhao, X., Cheng, X., Zhang, Y., & Hu, H#. (2023). ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models. ArXiv, abs/2304.07666. paper data
Hai Hu*, Ziyin Zhang*, Weifang Huang, Jackie Yan-Ki Lai#, Aini Li, Yina Patterson, Jiahui Huang, Peng Zhang, Chien-Jer Charles Lin, Rui Wang#. (2023). Revisiting Acceptability Judgements: CoLAC - Corpus of Linguistic Acceptability in Chinese. ArXiv, abs/2305.14091. *equal contributions. paper. data.

Benchmarking (Large) Language Models 大模型評測

We evaluates LLMs on various aspects of linguistic understanding, including but not limited to syntax, semantics and pragmatics in Chinese and beyond.

Jushi Kai, Tianhang Zhang, Hai Hu, Zhouhan Lin. (2024). SH2: Self-Highlighted Hesitation Helps You Decode More Truthfully. Proceedings of EMNLP (Findings). paper
Ziyin Zhang*, Yikang Liu*, Weifang Huang, Junyu Mao, Rui Wang#, Hai Hu#. (2024). MELA: Multilingual Evaluation of Linguistic Acceptability. Proceedings of ACL. paper. data *equal contributions
Shisen Yue, Siyuan Song, Xinyuan Cheng, Hai Hu#. (2024). Do Large Language Models Understand Conversational Implicature – A case study with a Chinese sitcom. Proceedings of CCL. paper. data. [Highlight Paper Award 亮點論文獎]

Natural Language Understanding/Natural Language Inference 自然語言理解/自然語言推理

We teach computers to understand human language, in the form of natural language inference.

Aikaterini-Lida Kalouli*, Hai Hu*, Alexander F. Webb, Lawrence S. Moss, Valeria de Paiva. (2023). Curing the SICK and other NLI maladies. Computational Linguistics. 49 (1): 199–243. doi: https://doi.org/10.1162/coli_a_00465. *equal contributions. paper. data. (SSCI)
Xu, Liang, Xiaojing Lu, Chenyang Yuan, Xuanwei Zhang, Huilin Xu, Hu Yuan, Guoao Wei, Pan Xiang, Xin Tian, Hai Hu. (2021). FewCLUE: A Chinese few-shot learning evaluation benchmark. arXiv preprint arXiv:2107.07498. paper. code.
Hu, Hai, He Zhou, Zuoyu Tian, Yiwen Zhang, Yina Ma, Yanting Li, Yixin Nie, Kyle Richardson (2021). Investigating Transfer Learning in Multilingual Pre-trained Language Models through Chinese Natural Language Inference. In: Findings of ACL. paper. code.
Xu, Liang, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, Kyle Richardson, and Zhenzhong Lan (2020). CLUE: A Chinese Language Understanding Evaluation Benchmark. In Proceedings ofthe 28th International Conference on Computational Linguistics (COLING). pp. 4762–4772. paper. website. github page
Hu, Hai, Kyle Richardson, Liang Xu, Lu Li, Sandra Kuebler, and Larry Moss. (2020). OCNLI: Original Chinese Natural Language Inference. In: Findings of the Association for Computational Linguistics: EMNLP 2020. pp. 3512–3526. paper. code and data. leaderboard.
Richardson, Kyle, Hai Hu, Larry Moss, and Ashish Sabharwal. (2020). Probing Natural Language Inference Models through Semantic Fragments. In: Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence. pp. 8713-8721. paper. code and data.
Hu, Hai, Qi Chen, Kyle Richardson, Atreyee Mukherjee, Lawrence S Moss, and Sandra Kuebler. (2020). MonaLog: a Lightweight System for Natural Language Inference Based on Monotonicity. In: Proceedings of the Society for Computation in Linguistics 2020. pp. 319-329. paper. poster. code.
Hu, Hai, Qi Chen and Larry Moss. (2019). Natural Language Inference with Monotonicity. In Proceedings of the 13th International Conference on Computational Semantics (IWCS 2019), pp. 8–15. Gothenburg, Sweden. paper.
Hu, Hai, and Lawrence S. Moss. (2018). Polarity Computations in Flexible Categorial Grammar. In Proceedings of the 7th Joint Conference on Lexical and Computational Semantics: *SEM, pp. 124–129. New Orleans, Louisiana, USA. paper. poster. code.

semantic change 語義變遷

Here I work on detecting semantic change using word embeddings (word2vec, GloVe) in low-resource scenarios, e.g., medieval Spanish.

Amaral, Patrícia, Hai Hu and Sandra Kübler (2023). "Tracing semantic change with distributional methods: The contexts of algo". Diachronica. https://doi.org/10.1075/dia.21012.ama paper (SSCI).
Hu, Hai, Patrícia Amaral and Sandra Kübler (2022). "Word Embeddings and Semantic Shifts in Historical Spanish: Methodological Considerations". Digital Scholarship in the Humanities. Volume 37, Issue 2, Pages 441–461. https://doi.org/10.1093/llc/fqab050 paper. code (SSCI)

corpus translation studies/treebank construction 語料庫翻譯研究/翻譯漢語樹庫建設

I am also interested in the morphological, syntactic and stylistic characteristics of translated Chinese (翻譯漢語) and Europeanized Chinese (歐化漢語).

To this end, I 1) employ machine learning methods to study translations and 2) build treebanks (=syntactically annotated corpora) to look into the syntactic features of translationese.

Hu, Hai and Sandra Kübler. (2021). Investigating Translated Chinese and Its Variants Using Machine Learning. In Natural Language Engineering. Volume 27, Issue 3 , May 2021 , pp. 339 - 372. https://doi.org/10.1017/S1351324920000182 (SCI/SSCI/AHCI) paper. code.
Hu, Hai, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Sandra Kübler, and Chien-Jer Charles Lin (2020). "Building a Literary Treebank for Translation Studies in Chinese". In: Proceedings of 19th International Workshop on Treebanks and Linguistic Theories (TLT). pp. 18-31. paper.
Hu, Hai, Wen Li, and Sandra Kübler. (2018). Detecting Syntactic Features of Translated Chinese. In Proceedings of the 2nd Workshop on Stylistic Variation, pp. 20-28. New Orleans, Louisiana, USA. paper. slides. video presentation.

Other papers 其他

I'm a linguist, so I also collaborate with other linguists on very linguistic-y projects where computational modeling is sometimes used.

Li, A., Tamminga, M., & Hu, H. (2023). Intra- and interspeaker repetitiveness in Chengdu Mandarin locative variation. Language Variation and Change, 1-21. doi:10.1017/S095439452300008X Paper.
Lin, Chien-Jer Charles, and Hai Hu. (2023). Linking comprehension and production: Frequency distribution of Chinese relative clauses in the Sinica Treebank. In Chu-Ren Huang, Shukai Hsieh, & Peng Jin (eds.) Chinese Language Resources: Data Collection, Linguistic Analysis, Annotation and Language Processing. Springer. https://doi.org/10.1007/978-3-031-38913-9_23 paper
Hu, Hai and Yiwen Zhang. (2017). Path of Vowel Raising in Chengdu Dialect of Mandarin. In Proceedings of the 29th North America Conference on Chinese Linguistics. Rutgers, NJ. pp. 481-498. paper. abstract.

所有發表文章請參看🦺：https://huhailinguist.github.io/publications/

翻譯

《表象與本質——類比，思考之源和思維之火》劉健💂🏿、胡海🤾🏽‍♂️、陳祺譯；[美] 侯世達 / [法] 桑德爾著；浙江人民出版社；2018年🪱；豆瓣網頁

Recent talks:

2023/04: 預訓練模型進展與展望. SJTU SFL.
2022/03: Examining the Replicability of Grammaticality Judgments in Chinese Journal Articles: Dialectal Influences and Sources of Variability. Annual Conference on Human Sentence Processing (UC Santa Cruz; Online)
2021/12: Recent progress in natural language inference. AWS AI Lab Shanghai.
2021/11: Everytime I hire a linguist, my accuracy goes down: why NLU still needs linguists now? Fudan University NLP lab.