This year, I made a CUI tool called deck2pdf that captures HTML slides and converts them to PDF.
Basically, as the method often comes out if you google, "Capture each page to PNG → connect all to make PDF", in version 0
I wrote it in a shared manner. So, I was looking for and found this Ghost.py with the intention of unifying it with the Python package as much as possible.
Ghost.py is a WebKit client written in Python with sessions, evaluate, screen captures and more. I haven't made a detailed comparison with PhantomJS etc., but I didn't have any trouble with the minimum behavior (although I was getting stuck), so I used it in my own package as it is.
Ghost.py uses Qt, so you need PySide or PyQt. This time I tried using PySide.
$ brew install qt
$ pip install PySide==1.2.2
$ pyside_postinstall.py -install
$ pip install Ghost.py
Actually, the latest version of PySide is 1.2.4 at this point, but since 1.2.4 does not have a Mac wheel, it seems that the build etc. will work and it will take a long time to install. If you don't have PySide installed at this time and want to try it out for the time being, I think it's faster to use 1.2.2 as described above.
First, start ghost
>>> from ghost import Ghost
>>> ghost = Ghost()
>>> session = ghost.start()
The client-side process runs when you create an instance of Ghost. Create an instance of the session with the start () method.
Visit the HTML5 slides demo page
>>> resp = session.open('http://html5slides.googlecode.com/svn/trunk/template/index.html')
2015-12-20T18:02:12.662Z [WARNING ] QT: libpng warning: iCCP: known incorrect sRGB profile
2015-12-20T18:02:12.746Z [WARNING ] QT: libpng warning: iCCP: known incorrect sRGB profile
>>> type(resp)
<type 'tuple'>
>>> len(resp)
2
>>> resp[0]
<ghost.ghost.HttpResource object at 0x10a99f510>
>>> resp[1]
[<ghost.ghost.HttpResource object at 0x10a99f510>, <ghost.ghost.HttpResource object at 0x10a99f410>, <ghost.ghost.HttpResource object at 0x10a99f610>, <ghost.ghost.HttpResource object at 0x10a99f750>, <ghost.ghost.HttpResource object at 0x10a99f8d0>, <ghost.ghost.HttpResource object at 0x10a99f910>, <ghost.ghost.HttpResource object at 0x10a99fb10>, <ghost.ghost.HttpResource object at 0x10a99fc10>, <ghost.ghost.HttpResource object at 0x10a99fa10>, <ghost.ghost.HttpResource object at 0x10a99fd10>]
>>>
>>> resp[0].url
u'http://html5slides.googlecode.com/svn/trunk/template/index.html'
>>> resp[1][0].url
u'http://html5slides.googlecode.com/svn/trunk/template/index.html'
>>> resp[1][1].url
u'http://html5slides.googlecode.com/svn/trunk/slides.js'
>>> resp[1][2].url
u'http://fonts.googleapis.com/css?family=Open+Sans:regular,semibold,italic,italicsemibold|Droid+Sans+Mono'
It's hard to understand if it's an interactive shell, but it will request and get all the resources referenced in the URL and content specified in session.open (url)
.
>>> session.capture_to('capture_1.png')
You can take a screenshot with the capture_to method. But,,,
If you don't specify the capture area properly, it will be terrible. Or it may be better to fix the size in advance because the viewport can be set.
>>> session.capture_to('capture_2.png', region=(1940, 0, 3000, 740))
>>>
This Ghost.py can call js directly in the session.
Move html5slides slides to the next page
>>> session.evaluate('nextSlide()')
(None, [])
>>> session.capture_to('capture_3.png', region=(1940, 0, 3000, 740))
The slide does not advance even if I execute the function to advance the slide and capture it. (If you execute nextSlide on normal Chrome etc., the slide will proceed without problems)
As I confirmed while making deck2pdf, it seems that the Ghost.py session delegates the time progress to the code outside Ghost.py. Therefore, if you do nothing, even if you call the js code with evaluate, it will not be executed unless the time advances.
python
>>> session.sleep(1)
>>> session.capture_to('capture_4.png', region=(1940, 0, 3000, 740))
Here is the result of sliding for 1 second. I was able to capture the slide contents without any problems.
Although there are some quirks like this, I can do some things that can be done based on WebKit, so it seemed that I could play various things if I got along well.
It seems that it is troublesome for material such as SpeakerDeck clone with high Python purity or making slides from an archive that summarizes HTML.
Recommended Posts