Add README.md
This commit is contained in:
133
README.md
Normal file
133
README.md
Normal file
@ -0,0 +1,133 @@
|
||||
# BooruDex
|
||||
|
||||
Abbreviation for "Booru Index" is a booru scraper that supports finding simillar images.
|
||||
It can optionally download images and serve them locally.
|
||||
|
||||
## Initial thought
|
||||
|
||||
### Hub (BooruDex)
|
||||
|
||||
The main server that has access to the database and file storage. It has no access to the internet,
|
||||
but workers can connect to the hub in order to receive tasks.
|
||||
|
||||
The hub is responsible for the following tasks:
|
||||
- Image hashing
|
||||
- Media storage
|
||||
|
||||
#### File Storage
|
||||
|
||||
##### Media
|
||||
|
||||
Media files will have UUIDs as their filenames with MIME subtype as their extension.
|
||||
UUIDs will be have a pair of their first two digits split into directories.
|
||||
|
||||
For example: A JPEG image with UUID f81d4fae-7dec-11d0-a765-00a0c91e6bf6, would be stored as:
|
||||
`media/f8/1d/4fae7dec11d0a76500a0c91e6bf6.jpeg`
|
||||
|
||||
##### Thumbnails
|
||||
|
||||
Thumbnails are stored exactly like media files, except thumbnails are always formatted as jpeg,
|
||||
and are placed in a different directory to media files.
|
||||
|
||||
E.g.: `thumbnails/f8/1d/4fae7dec11d0a76500a0c91e6bf6.jpeg`
|
||||
|
||||
#### Database
|
||||
|
||||
##### Tasks
|
||||
|
||||
A table containing tasks that the hub wants executed
|
||||
|
||||
- id - Task ID
|
||||
- domain - Booru domain of the task
|
||||
- type - Type of the task (scraping, download, etc.)
|
||||
- data - Task data (some URL, ID range, etc.)
|
||||
- pending - Is it pending? If so, sence when?
|
||||
- assignee - Is it assigned? If so, to who?
|
||||
|
||||
##### Tags
|
||||
|
||||
A table containing known tags and optionally their category. Combination of label and category must be unique.
|
||||
|
||||
- id - Tag ID
|
||||
- label - Label on the tag
|
||||
- category - Optionall tag category
|
||||
|
||||
##### Boorus
|
||||
|
||||
A table containing a list of boorus being handled by BooruDex.
|
||||
|
||||
- id - Booru ID
|
||||
- domain - The domain of the booru
|
||||
- posts - The name of the table that contains booru posts
|
||||
- tags - The name of the table that contains tag relations
|
||||
- categories - The name of the table that contains tag categories
|
||||
- latest - Known latest post in the booru
|
||||
|
||||
##### Booru_[id]_posts
|
||||
|
||||
A table containing post data for it's booru.
|
||||
|
||||
- id - Post ID
|
||||
- image - Media ID (referencing media table)
|
||||
- thumb - Thumbnail ID (referencing thumb table)
|
||||
- update - Last time the post entry was updated/tagged
|
||||
|
||||
##### Booru_[id]_tags
|
||||
|
||||
A table containing tag relations for it's booru.
|
||||
|
||||
- tag - Tag ID (referencing tags table)
|
||||
- post - Post ID (referencing booru_[id]_posts table)
|
||||
|
||||
##### Booru_[id]_categories
|
||||
|
||||
A table containing tag categories as they are represented by the booru.
|
||||
|
||||
- label - Tag label (unique)
|
||||
- category - Tag category
|
||||
|
||||
##### Media
|
||||
|
||||
A table containing data about media.
|
||||
|
||||
- id - Media ID
|
||||
- uuid - Unique v4 UUID for referencing the actuall media file
|
||||
- size - The size of media file
|
||||
- width - Media width
|
||||
- height - Media height
|
||||
- mime - Media mime type
|
||||
- dhash - Difference hash of the media
|
||||
- phash - Perspective hash of the media
|
||||
|
||||
##### Thumb
|
||||
|
||||
A table containing data about thumbnails
|
||||
|
||||
- id - Thumbnail ID
|
||||
- uuid - Unique v4 UUID for referencing the actuall thumbnail file
|
||||
- size - The size of thumbnail
|
||||
- width - Thumbnail width
|
||||
- height - Thumbnail height
|
||||
- media - Media ID (referencing media table)
|
||||
|
||||
##### Workers
|
||||
|
||||
A table containing a list of known workers and their statistics.
|
||||
|
||||
- id - Worker ID
|
||||
- uuid - Unique v4 UUid for referencing actuall workers
|
||||
- ip - Latest IP address the worker has connected with
|
||||
- seen - Latest date the worker has connected at
|
||||
- scraped - The amount of posts the worker has scraped
|
||||
- thumbs - The amount of thumbnails the worker has downloaded
|
||||
- media - The amount of media the worker has downloaded
|
||||
|
||||
### Workers
|
||||
|
||||
Workers request a number of tasks from the hub, providing supported types of tasks and supported
|
||||
booru domains.
|
||||
|
||||
Current thoughts of types of workers:
|
||||
|
||||
- Scraper - Scrapes a range of post given their ids, returns their tags/metadata, media URL and optionally a thumbnail.
|
||||
- Downloader - Downloads media and their mime-type given their URLs.
|
||||
Reference in New Issue
Block a user