Shortest path to a functional webscraper using Selenium + Nokogiri + Rails + Heroku

1. Create a new Rails 7 application:

 rails new webscraper --database=postgresql

2. Move the following gems outside of the test group and bundle:

gem 'selenium-webdriver'
gem 'webdrivers'

3. Run the following to add Linux as a supported platform within your Gemfile.lock, this is necessary to support the Heroku deployment:

bundle lock --add-platform x86_64-linux

4. Add a new file under ./app/scrapers/scraper.rb with the following code:

class Scraper
  def scrape
    require "selenium-webdriver"
    Selenium::WebDriver::Chrome.path = ENV["GOOGLE_CHROME_BIN"] if Rails.env.production?

    arguments = %w[--headless --no-sandbox --disable-gpu]
    options = Selenium::WebDriver::Chrome::Options.new(args: arguments)
    driver = Selenium::WebDriver.for(:chrome, options: options)

    driver.get("https://quotes.toscrape.com/js/")

    doc = Nokogiri::HTML(driver.page_source)
    doc.css('.quote').each do |link|
      puts link.content
    end

    driver.quit
  end
end

5. Create a rake task under ./lib/tasks/scraper.rb with the following code:

namespace :scraper do
  desc "Scrape"
  task scrape: :environment do
    scraper = Scraper.new
    scraper.scrape
  end
end

6. Test the scraper locally:

rake scraper:scrape

7. Commit your changes:

git add -A
git commit -m "initial"

8. Next create a new Heroku app and add the following build packs:

heroku create
heroku buildpacks:add --index 1 heroku/ruby
heroku buildpacks:add --index 2 heroku/chromedriver
heroku buildpacks:add --index 3 heroku/google-chrome

9. After deploying, test the scaper in production:

heroku run rake scraper:scrape

10. If running the scraper as a recurring job, set up a new job using Heroku Scheduler.