This article is a relay of "2020 New Year Advent Calendar TechConnect!" of Link Information Systems. This is an article. TechConnect! Is a self-starting Advent calendar that is relayed by a self-made group called engineer.hanzomon. (For Facebook of the link information system, click here](https://ja-jp.facebook.com/lis.co.jp/))
This article is for the 7th day, 1/15 (Wednesday).
Past article I also do it, but I've become a person in charge of counting the number of likes of our ad-care. (I haven't made an article about the ad-care before that, but I collect it with shell art) So, anyway, if the number of likes to be automated exceeds a certain number, I thought it would be nice if I notified it like a Qiita milestone ~~ It would be fun ~~, so I decided to run it on AWS Lambda.
As for the implementation, it is like scraping the Adcare top page to collect the article URL and getting the number of likes with the Qiita API. First from the Lambda function that collects the article ID
import os
import requests
import boto3
from selenium import webdriver
from bs4 import BeautifulSoup
from urllib.parse import urljoin
def lambda_handler(event, context):
api_endpoint = 'https://qiita.com/api/v2/'
try:
dynamoDB = boto3.resource("dynamodb")
advent_calendar = dynamoDB.Table("advent_calendar")
options = webdriver.ChromeOptions()
options.binary_location = "/opt/bin/headless-chromium"
options.add_argument("--headless")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1280x1696")
options.add_argument("--disable-application-cache")
options.add_argument("--disable-infobars")
options.add_argument("--no-sandbox")
options.add_argument("--hide-scrollbars")
options.add_argument("--enable-logging")
options.add_argument("--log-level=0")
options.add_argument("--single-process")
options.add_argument("--ignore-certificate-errors")
options.add_argument("--homedir=/tmp")
driver = webdriver.Chrome(executable_path="/opt/bin/chromedriver", options=options)
driver.get(os.environ['TARGET_URL'])
soup = BeautifulSoup(driver.page_source, 'html.parser')
item = soup.find('div', id='personal-public-article-body')
tables = item.find_all('tbody')
for table in tables:
rows = table.find_all('tr')
for row in rows:
user_id = row.find_all('td')[1].text
tmp = row.find_all('td')[2].find('a')['href']
item_id = tmp[tmp.find('items/'):]
response = advent_calendar.get_item(
Key={
'user_id': user_id,
'item_id': item_id
}
)
if 'Item' not in response:
advent_calendar.put_item(
Item = {
"user_id": user_id,
"item_id": item_id,
'likes': 0
}
)
except Exception as e:
print(e)
finally:
driver.quit()
return
So, the function that issues the Qiita API for this collected article ID and gets the number of likes is as follows
import os
import boto3
import requests
from urllib.parse import urljoin
import smtplib
from email.message import EmailMessage
def lambda_handler(event, context):
api_endpoint = 'https://qiita.com/api/v2/'
headers = {'Authorization': 'Bearer ' + os.environ['QIITA_AUTH']}
dynamoDB = boto3.resource("dynamodb")
advent_calendar = dynamoDB.Table("advent_calendar")
try:
smtp = smtplib.SMTP_SSL(os.environ['SMTP_HOST'], int(os.environ['SMTP_PORT']))
smtp_user = os.environ['SMTP_USER']
smtp_pass = os.environ['SMTP_PASS']
message = EmailMessage()
message['From'] = os.environ['FROM_ADDRESS']
message['To'] = os.environ['TO_ADDRESS']
message['Subject'] = 'Adcare Like Monitoring'
smtp.login(smtp_user, smtp_pass)
response = advent_calendar.scan()
for i in response['Items']:
user_id = i['user_id']
item_id = i['item_id']
old_likes = int(i['likes'])
item_url = urljoin(api_endpoint, item_id)
item_detail = requests.get(item_url, headers=headers).json()
title = item_detail['title']
url = item_detail['url']
new_likes = int(item_detail['likes_count'])
comments = int(item_detail['comments_count'])
stockers_url = urljoin(api_endpoint, item_id + '/stockers?per_page=100')
stockers = len(requests.get(stockers_url, headers=headers).json())
if old_likes < 100 and new_likes >= 100:
message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 100 likes")
smtp.send_message(message)
elif old_likes < 50 and new_likes >= 50:
message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 50 likes")
smtp.send_message(message)
elif old_likes < 30 and new_likes >= 30:
message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 30 likes")
smtp.send_message(message)
elif old_likes < 10 and new_likes >= 10:
message.set_content(user_id+"Article ""+title+"("+url+")Has exceeded 10 likes")
smtp.send_message(message)
advent_calendar.put_item(
Item = {
"user_id": user_id,
"item_id": item_id,
"likes" : new_likes,
"comments" : comments,
"stockers" : stockers
}
)
except Exception as e:
print(e)
finally:
smtp.close()
return
Actually, I wanted to send a notification to Microsoft Teams, which is used as our internal chat, but I could not realize it because of my two-step verification with Office 365 authentication ... Currently, I am just flying to my email address. I thought I'd try to transfer it automatically in Outlook, but I couldn't transfer it due to lack of authority. I wonder what it is.
There is a slight feeling of one-handedness, but I was able to automate the collection of likes. I wonder if I will pull the data of DynamoDB and get the final result when the calendar is over and there is a paragraph. At first, I was thinking of using ZABBIX's HTTP Agent, but since the free EC2 tier has disappeared, I decided to use Lambda + DynamoDB. Free frame is the best.
Tomorrow is @ h-yamasaki.
1/17 Also modified to collect the number of comments and the number of stocks, the number of API issuance increased, so it seemed that the upper limit of 1000 times per hour would be caught, so the monitoring interval of CloudWatch Event was changed from 1 minute to 5 minutes
Recommended Posts