Publish: 2024-08-18 | Modify: 2024-08-18
Ollama is an open-source tool that allows users to conveniently run various large open-source models locally, including Tsinghua University's ChatGLM, Alibaba's Qwen, and Meta's Llama, among others. Currently, Ollama is compatible with the three major operating systems: macOS, Linux, and Windows. This article will introduce how to install Ollama via Docker and deploy it to use local large models, while also connecting to one-api for easy access to the required large language models through an API interface.
Due to the high hardware requirements of large models, the higher the machine configuration, the better; having a dedicated graphics card is even better, and it is recommended to start with 32GB of RAM. The author has deployed it on a dedicated server with the following configuration:
Note: My dedicated server does not have a graphics card, so it can only run on CPU.
Ollama now supports Docker installation, greatly simplifying the deployment difficulty for server users. Here, we will use the docker compose tool to run Ollama. First, create a new docker-compose.yaml
file with the following content:
version: '3'
services:
ollama:
image: ollama/ollama
container_name: ollama
ports:
- "11434:11434"
volumes:
- ./data:/root/.ollama
restart: always
Then, enter the command docker compose up -d
or docker-compose up -d
to run it. After running, access: http://IP:11434
, and if you see the message Ollama is running
, it indicates success, as shown in the image below:
If your machine supports GPU, you can add GPU parameter support. Refer to: https://hub.docker.com/r/ollama/ollama
After installing Ollama, you need to download large models. Supported large models can be found on the Ollama official website: https://ollama.com/library. Ollama does not provide a web interface by default and needs to be used via the command line. First, enter the container with the command:
docker exec -it ollama /bin/bash
Once inside the container, go to the official website to find the large model you want to download. For example, to download Alibaba's Qwen2 model, use the command:
ollama run qwen2
Once the model is downloaded and running, you can have a conversation via the command line, as shown in the image below:
Common Ollama Commands
Here are some common Ollama commands:
ollama run llama3:8b-text
ollama list
ollama ps
ollama rm llama3:8b-text
Tip: More commands can also be viewed by entering
ollama -h
.
Currently, I have downloaded the llama2/qwen2/glm4/llama3/phi3
large models for a simple user experience, and I have come to some possibly not very rigorous and accurate impressions:
llama
model is not friendly to Chinese (understandable, as it is a foreign model).phi3:3.8b
, a small model launched by Microsoft, supports multiple languages. In practice, 3.8b
seems quite limited; perhaps the model parameters are too few. I wonder if increasing it to 14b
would improve things.glm4/qwen2
is more friendly to Chinese.7b
and above can generally understand and converse normally, while smaller models often make mistakes.7b
model on pure CPU is slightly slow.One-api is an open-source AI middleware service that can aggregate APIs from various large models, such as OpenAI, ChatGLM, and Wenxin Yiyan, providing a unified OpenAI calling method. For example, the API calling methods for ChatGLM and Wenxin Yiyan are different; one-api can integrate them and provide a unified OpenAI calling method, allowing you to simply change the model name when calling, thus eliminating interface differences and reducing development difficulty.
For specific installation methods for one-api, please refer to the official project address: https://github.com/songquanpeng/one-api
Through one-api backend >> Channels >> Add a new channel:
http://IP:11434
As shown in the image below:
After connecting, we can request one-api and pass the specific model name for testing, using the following command:
curl https://ai.xxx.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-xxx" \
-d '{
"model": "qwen2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
]
}'
ai.xxx.com
to your one-api domain.sk-xxx
with the token you created in one-api.If the call is successful, it indicates that Ollama has been successfully connected to one-api.
Encountered Issues
The author attempted to call one-api Ollama using the stream method but received a blank response. Through issues, it was discovered that it was a bug in one-api. Currently, downgrading the one-api version to 0.6.6
solves the issue, and we look forward to the author fixing this problem in the future.
Since Ollama itself does not provide an authentication access mechanism, deploying Ollama on a server poses security risks. Anyone who knows your IP and port can make API calls, which is very unsafe. In a production environment, we can take some measures to improve security.
Method 1: Linux Built-in Firewall
Method 2: Nginx Reverse Proxy
127.0.0.1
.127.0.0.1:11434
, and set up blacklists (deny) and whitelists (allow) on Nginx.As an open-source tool, Ollama provides users with a convenient way to deploy and call large models locally. Its excellent compatibility and flexibility make it easier to run large-scale language models across various operating systems. Through Docker installation and deployment, users can quickly get started and flexibly use various large models, providing strong support for development and research. However, due to the lack of a built-in authentication access mechanism, users should take appropriate security measures in production environments to prevent potential access risks. Overall, Ollama has significant practical value in promoting the application and development of local AI models. If it can improve its authentication mechanism in the future, it will undoubtedly become a powerful assistant for AI developers.
Some content in this article refers to:
I come from China and I am a freelancer. I specialize in Linux operations, PHP, Golang, and front-end development. I have developed open-source projects such as Zdir, ImgURL, CCAA, and OneNav.