Continuing from the last time, we will create the acquisition part of Qiita.
Qiita draft scraping
Get a list of your own drafts on Qiita
Since it is a login-type process like last time, use mechanize.
crawler.rb
crawler.rb
page = agent.get("https://qiita.com/drafts")
doc = Nokogiri::HTML.parse(page.body, nil, 'utf-8')
json = JSON.parse(doc.css('.js-react-on-rails-component')[1].inner_html)
json['creating_draft_items'].each do |item|
  if item['raw_body'].match(/Reservation posting/)
    id = item['item_uuid']
    title = item['title']
    raw_body = item['raw_body']
    tags = item['tag_notation'].split(' ')
    agent.get("https://qiita.com/drafts/#{id}")
    tag_data = []
    tags.each do |tag|
      tag_data.push({name:tag,versions: []})
    end
  end
end
Add the above sentence by adding & modifying from the last time. In the above code, we get a list of draft information, and if there is a word "reserved post" in it, we get that information. Last time, I specified the URL as an ID in the draft acquisition part, but it will be redirected by / drafts, so this can be done.
【next time】 I'm finally going to make a post part, but it seems to be more difficult than I thought ... Maybe I'll rely on selenium ...