2025-11-04 18:51:05 +01:00
|
|
|
# BooruDex
|
|
|
|
|
|
|
|
|
|
Abbreviation for "Booru Index" is a booru scraper that supports finding simillar images.
|
|
|
|
|
It can optionally download images and serve them locally.
|
|
|
|
|
|
|
|
|
|
## Initial thought
|
|
|
|
|
|
|
|
|
|
### Hub (BooruDex)
|
|
|
|
|
|
|
|
|
|
The main server that has access to the database and file storage. It has no access to the internet,
|
|
|
|
|
but workers can connect to the hub in order to receive tasks.
|
|
|
|
|
|
|
|
|
|
The hub is responsible for the following tasks:
|
|
|
|
|
- Image hashing
|
|
|
|
|
- Media storage
|
|
|
|
|
|
|
|
|
|
#### File Storage
|
|
|
|
|
|
|
|
|
|
##### Media
|
|
|
|
|
|
|
|
|
|
Media files will have UUIDs as their filenames with MIME subtype as their extension.
|
|
|
|
|
UUIDs will be have a pair of their first two digits split into directories.
|
|
|
|
|
|
|
|
|
|
For example: A JPEG image with UUID f81d4fae-7dec-11d0-a765-00a0c91e6bf6, would be stored as:
|
|
|
|
|
`media/f8/1d/4fae7dec11d0a76500a0c91e6bf6.jpeg`
|
|
|
|
|
|
|
|
|
|
##### Thumbnails
|
|
|
|
|
|
|
|
|
|
Thumbnails are stored exactly like media files, except thumbnails are always formatted as jpeg,
|
|
|
|
|
and are placed in a different directory to media files.
|
|
|
|
|
|
|
|
|
|
E.g.: `thumbnails/f8/1d/4fae7dec11d0a76500a0c91e6bf6.jpeg`
|
|
|
|
|
|
|
|
|
|
#### Database
|
|
|
|
|
|
|
|
|
|
##### Tasks
|
|
|
|
|
|
|
|
|
|
A table containing tasks that the hub wants executed
|
|
|
|
|
|
|
|
|
|
- id - Task ID
|
|
|
|
|
- domain - Booru domain of the task
|
|
|
|
|
- type - Type of the task (scraping, download, etc.)
|
|
|
|
|
- data - Task data (some URL, ID range, etc.)
|
|
|
|
|
- pending - Is it pending? If so, sence when?
|
|
|
|
|
- assignee - Is it assigned? If so, to who?
|
|
|
|
|
|
|
|
|
|
##### Tags
|
|
|
|
|
|
|
|
|
|
A table containing known tags and optionally their category. Combination of label and category must be unique.
|
|
|
|
|
|
|
|
|
|
- id - Tag ID
|
|
|
|
|
- label - Label on the tag
|
|
|
|
|
- category - Optionall tag category
|
|
|
|
|
|
|
|
|
|
##### Boorus
|
|
|
|
|
|
|
|
|
|
A table containing a list of boorus being handled by BooruDex.
|
|
|
|
|
|
|
|
|
|
- id - Booru ID
|
|
|
|
|
- domain - The domain of the booru
|
|
|
|
|
- posts - The name of the table that contains booru posts
|
|
|
|
|
- tags - The name of the table that contains tag relations
|
|
|
|
|
- categories - The name of the table that contains tag categories
|
|
|
|
|
- latest - Known latest post in the booru
|
|
|
|
|
|
|
|
|
|
##### Booru_[id]_posts
|
|
|
|
|
|
|
|
|
|
A table containing post data for it's booru.
|
|
|
|
|
|
|
|
|
|
- id - Post ID
|
|
|
|
|
- image - Media ID (referencing media table)
|
|
|
|
|
- thumb - Thumbnail ID (referencing thumb table)
|
2025-12-05 08:35:18 +01:00
|
|
|
- purity - A single character describing the purity of the post
|
2025-11-04 18:51:05 +01:00
|
|
|
- update - Last time the post entry was updated/tagged
|
|
|
|
|
|
|
|
|
|
##### Booru_[id]_tags
|
|
|
|
|
|
|
|
|
|
A table containing tag relations for it's booru.
|
|
|
|
|
|
|
|
|
|
- tag - Tag ID (referencing tags table)
|
|
|
|
|
- post - Post ID (referencing booru_[id]_posts table)
|
|
|
|
|
|
|
|
|
|
##### Booru_[id]_categories
|
|
|
|
|
|
|
|
|
|
A table containing tag categories as they are represented by the booru.
|
|
|
|
|
|
|
|
|
|
- label - Tag label (unique)
|
|
|
|
|
- category - Tag category
|
|
|
|
|
|
|
|
|
|
##### Media
|
|
|
|
|
|
|
|
|
|
A table containing data about media.
|
|
|
|
|
|
|
|
|
|
- id - Media ID
|
|
|
|
|
- uuid - Unique v4 UUID for referencing the actuall media file
|
|
|
|
|
- size - The size of media file
|
|
|
|
|
- width - Media width
|
|
|
|
|
- height - Media height
|
|
|
|
|
- mime - Media mime type
|
|
|
|
|
- dhash - Difference hash of the media
|
|
|
|
|
- phash - Perspective hash of the media
|
|
|
|
|
|
|
|
|
|
##### Thumb
|
|
|
|
|
|
|
|
|
|
A table containing data about thumbnails
|
|
|
|
|
|
|
|
|
|
- id - Thumbnail ID
|
|
|
|
|
- uuid - Unique v4 UUID for referencing the actuall thumbnail file
|
|
|
|
|
- size - The size of thumbnail
|
|
|
|
|
- width - Thumbnail width
|
|
|
|
|
- height - Thumbnail height
|
|
|
|
|
- media - Media ID (referencing media table)
|
|
|
|
|
|
|
|
|
|
##### Workers
|
|
|
|
|
|
|
|
|
|
A table containing a list of known workers and their statistics.
|
|
|
|
|
|
|
|
|
|
- id - Worker ID
|
|
|
|
|
- uuid - Unique v4 UUid for referencing actuall workers
|
|
|
|
|
- ip - Latest IP address the worker has connected with
|
|
|
|
|
- seen - Latest date the worker has connected at
|
|
|
|
|
- scraped - The amount of posts the worker has scraped
|
|
|
|
|
- thumbs - The amount of thumbnails the worker has downloaded
|
|
|
|
|
- media - The amount of media the worker has downloaded
|
|
|
|
|
|
|
|
|
|
### Workers
|
|
|
|
|
|
|
|
|
|
Workers request a number of tasks from the hub, providing supported types of tasks and supported
|
|
|
|
|
booru domains.
|
|
|
|
|
|
|
|
|
|
Current thoughts of types of workers:
|
|
|
|
|
|
|
|
|
|
- Scraper - Scrapes a range of post given their ids, returns their tags/metadata, media URL and optionally a thumbnail.
|
2025-11-04 22:53:50 +01:00
|
|
|
- Downloader - Downloads media and their mime-type given their URLs.
|
|
|
|
|
|
|
|
|
|
### Planned Booru support
|
|
|
|
|
|
|
|
|
|
- [AllTheFallen](https://booru.allthefallen.moe)
|
|
|
|
|
- [Danbooru](https://danbooru.donmai.us)
|
|
|
|
|
- [Gelbooru](https://gelbooru.com)
|
|
|
|
|
- [Konachan](https://konachan.com)
|
|
|
|
|
- [Realbooru](https://realbooru.com)
|
|
|
|
|
- [Rule34](https://rule34.xxx)
|
|
|
|
|
- [Safebooru](https://safebooru.org)
|
|
|
|
|
- [Xbooru](https://xbooru.com)
|