Publish: 2023-04-08 | Modify: 2023-04-08
ChatGLM-6B is an open-source conversational language model based on the General Language Model (GLM) architecture, supporting both Chinese and English. The model is optimized using similar techniques as ChatGPT, trained on 1T bilingual data with 62 billion parameters, and supplemented with supervised fine-tuning, self-play feedback, and human feedback reinforcement learning.
ChatGLM-6B is jointly developed by the KEG Lab at Tsinghua University and Zhizhi AI. With model quantization technology, users can deploy it locally on consumer-grade GPUs (requiring a minimum of 6GB VRAM at INT4 quantization level).
ChatGLM-6B can be understood as a locally deployable lightweight version of ChatGPT.
After multiple attempts, xiaoz finally successfully ran the ChatGLM-6B conversational language model on Windows 10. This article records and shares the entire process.
This article is suitable for researchers interested in artificial intelligence and requires a certain level of programming and computer knowledge. If you are familiar with the Python programming language, you will have a better understanding of this article.
ChatGLM-6B has certain requirements for both hardware and software. Here is xiaoz's hardware information:
Software environment:
This article assumes that you have a certain level of programming and computer knowledge and will not provide detailed instructions on installing and using the above software tools. If you are not familiar with them, it is recommended to give up reading.
ChatGLM-6B is open-sourced on GitHub: https://github.com/THUDM/ChatGLM-6B
First, you need to clone the code using Git:
git clone https://github.com/THUDM/ChatGLM-6B.git
Next, xiaoz sets pip to use the Aliyun mirror source to facilitate the smooth installation of various Python dependencies. The command is:
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
pip config set install.trusted-host mirrors.aliyun.com
Then, go to the ChatGLM-6B
directory and install Python dependencies using the following command:
pip install -r requirements.txt
Next, download the model. Due to xiaoz's weak GPU, xiaoz chose the 4-bit quantized model. It is recommended to download the model in advance as the built-in Python script for downloading may fail easily and is slow.
The author hosts the model on the "Hugging Face Hub". We need to download the model from there. xiaoz chose the "4-bit quantized model" and executed the following command:
git clone -b int4 https://huggingface.co/THUDM/chatglm-6b.git
The pytorch_model.bin
file is quite large. If the git command is slow or fails, you can try manually downloading pytorch_model.bin
and placing it in the local repository directory.
Note: You cannot only download the .bin
file. You need to download the .json/.py
and other files inside and put them in the same directory. It is recommended to clone the entire repository with Git and then manually download and merge the .bin
file into one folder.
Enter the Python terminal and start running the ChatGLM-6B model using the following command:
# Specify the path to the model you cloned from Hugging Face Hub
mypath = "D:/apps\ChatGLM-6B\model\int4\chatglm-6b"
# Import dependencies
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained(mypath, trust_remote_code=True)
model = AutoModel.from_pretrained(mypath, trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "你好", history=[])
print(response)
During the runtime, it didn't go as smoothly as I imagined. I encountered an error message saying "Torch not compiled with CUDA enabled." I solved it by referring to this issue.
The solution is to execute the command:
python -c "import torch; print(torch.cuda.is_available())"
If it returns False
, it means that the installed PyTorch does not support CUDA. Then, I executed the following command:
pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 -f https://download.pytorch.org/whl/cu118/torch_stable.html
Finally, there were no more errors. However, everyone's hardware and software are different, so the encountered errors may vary. Just be flexible and adapt accordingly.
Running a 4-bit quantized model is quite resource-intensive, and the 8GB VRAM of the 3050 GPU is easily exhausted. The response speed also feels slow. (Click on the images to enlarge them)
The official repository also provides ways to run the model on the web and through an API. I encountered some errors with the web running method, which I haven't resolved yet. The CLI running method mentioned above works fine.
To check the PyTorch version you have installed, you can enter the following code in the Python interactive environment:
import torch
print(torch.__version__)
If the result displays x.x.x+cpu
, it may not support CUDA. Refer to the "Torch not compiled with CUDA enabled" error message resolution mentioned above.
Check the PyTorch version again. If it shows +cuxxx
, it means it supports GPU. In other words, the +cpu
version that does not support CPU is not available, and only the +cu
version that supports GPU can be used.
I had some brief conversations with ChatGLM-6B, and I feel that the results are good. Although it is not as good as ChatGPT overall, ChatGLM-6B is open-sourced by a domestic development team and can run on consumer-grade GPUs. I must give it praise and thumbs up. I hope the team continues to work hard and catch up with ChatGPT.
ChatGLM-6B GitHub repository: https://github.com/THUDM/ChatGLM-6B
I come from China and I am a freelancer. I specialize in Linux operations, PHP, Golang, and front-end development. I have developed open-source projects such as Zdir, ImgURL, CCAA, and OneNav.