pastebin-scraper
pastebin-scraper copied to clipboard
Live-scraping pastebin to fight boredom.
pastebin-scraper
This is a multithreaded scraping script for Pastebin. It scrapes the main site for new pastes, downloads their raw content and processes them by a user-defined output format.
WHY?
Fun.
Installation
The usual dance.
pip install -r requirements.txt
Define all required specs in settings.ini. Should you decide to go with a database output, make sure the respective connector is installed. At the moment MySQL with pymysql and SQLite with the standard built in Python 3 connector are supported.
Also note that the file output creates a subdirectory output and dumps every paste as a separate file into it.
Settings
ini is a highly underrated file format. Here are some definitions on what the settings parameter actually do.
GENERAL
PasteLimitStop after having scraped n pastes. Set to 0 for indefinite scrapingPBLinkURL to Pastebin or another equivalent siteDownloadWorkersNumber of workers that download the raw paste content and further process itNewPasteCheckIntervalTime to wait before checking the main site for new pastes againIPBlockedWaitTimeTime to wait until checking the main site again after the scraper's IP has been blocked
LOGGING
RotationLogLocation of log file that contains debug outputMaxRotationSizeSize in bytes before another log file is createdRotationBackupCountMaximum number of log files to keep
STDOUT/ FILE
EnableEnable formatted stdout output of paste dataContentDisplayLimitMaximum amount of characters to show before content is cut off (0 to display all)ShowNameDisplay the paste nameShowLangDisplay the paste languageShowLinkDisplay the complete paste linkShowDataDisplay the raw paste contentDataEncodingEncoding of the raw paste data
MYSQL
EnableEnable MySQL outputTableNameMain table name to insert data intoHostMySQL server hostPortMySQL server portUsernameMySQL server userPasswordUser password
SQLITE
EnableEnable SQLite outputFilenameFilename the db should be saved as (usually ends with .db)TableNameMain table name to insert data into
If you use this thing for some cool data analysis or even research, let me know if I can help!
Inspiration for this scraper was taken from here.