You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
forest
9623b29ea6
|
9 months ago | |
---|---|---|
.gitignore | 9 months ago | |
Dockerfile | 9 months ago | |
ReadMe.md | 9 months ago | |
bible.txt | 9 months ago | |
config.json | 9 months ago | |
go.mod | 9 months ago | |
go.sum | 9 months ago | |
main.go | 9 months ago |
ReadMe.md
forgejo-crawler-blocker
What does a GPT training web-crawler see when it tries to access our forgejo instance and look at every single file at every single commit, ignoring robots.txt
and sending a generic user-agent header? Here is the preview:
Yep thats right. The entire christian bible (4MB), at about 100 bytes per second.
maintenance
if anyone needs to clear the data to unblock someone, these are the commands to run on paimon:
sudo -i
docker stop gitea_forgejo-crawler-blocker
rm /etc/docker-compose/gitea/forgejo-crawler-blocker/traffic.db
docker start gitea_forgejo-crawler-blocker
persistent data storage
/forgejo-crawler-blocker/data
inside the docker container.
forests manaul build process
Run on server: (paimon)
cd /home/forest/forgejo-crawler-blocker && git pull sequentialread main && cd /etc/docker-compose/gitea && docker stop gitea_forgejo-crawler-blocker_1 || true && docker rm gitea_forgejo-crawler-blocker_1 || true && docker image rm gitea_forgejo-crawler-blocker || true && rm -f forgejo-crawler-blocker/traffic.db && docker-compose up -d && sleep 1 && docker logs -n 1000 -f gitea_forgejo-crawler-blocker_1