Friday, August 26, 2016
Export Blogger archive to Google Calendar or iCal
Export Blogger archive to Google Calendar or iCal
Here is a way to export a Blogger archive file to your Google Calendar, or any iCal-based calendar program.
Background: Blogger .xml archive -> iCal .ics file
Basically, I was looking for a way to recreate the sort of "Timehop" feature where what you were doing on social media "On this Day..." would resurface on that day years later. You know, like with Facebooks "On this Day" feature or Google Photos "Rediscover this Day". But I couldnt find any such thing for a Blogger blog, and you might have years of old embarrassing posts that youd love to relive. Why leave them just sitting unread? Relive them each day by having those old posts appear (either as links or full entries) in your calendar.
So I haphazardly found a way to import those posts into Calendar, so that I can set them as recurring events. Now Im no programmer or app author. Basically, I just manually edited my Blogger archive file with a bunch of Find/Replace commands, until it met the standards of an iCal file for importing to Google Calendar. In other words, I manually transformed the file from one type to another. Its a totally do-it-yourself way.
Now like I said, Im no expert and I am 100% sure there are better ways to do this, and someone much smarter than me could probably write a self-contained program or macro to do this automatically. But Ive never found a program like that to convert Blogger archive .xml files to .iCal files. So I had to make do with my own limited knowledge.
But Im hoping that following these copy/paste instructions can save you the hassle of figuring out the code manually, and can help semi-automate this process so that you can convert over the file quickly and hopefully painlessly. Just copy/paste these strings into the Find/Replace box and in about 5~10 minutes you should be ready to go.
Tool: Notepad++
The only tool you will need is a text editor program called Notepad++.
I chose this because its free, open-source, and works great with regular expressions. I used version 6.8.8.
Find/Replace dialog box in Notepad++ |
This is the Find/Replace dialog box in Notepad++. We are going to basically use this and only this. A lot. So get comfortable with it.
And before we start double-check that "Regular expressions" is selected and ".matches newline" is checked.
The iCal format
So a basic iCal file follows this format. I need to make my blog archive file look like this. This will be the template style that were aiming for.
BEGIN:VCALENDAR
PRODID:<Test>
VERSION:2.0
BEGIN:VEVENT
DTSTART;VALUE=DATE:20160130
RRULE:FREQ=YEARLY
DESCRIPTION:here is some entry content. Looking good my man.
SUMMARY:here is the event title aka blog post title
END:VEVENT
END:VCALENDAR
As far as I could tell, Google will only successfully import the file if you have these items as a minimum. OK, lets start.
Open the Blogger archive .xml file in Notepad++.
Part 1 - Preliminary file clean-up
Step 1.1 - Beautify (Optional)
You dont have to do this, but I recommend it. The big block of code on your screen is ugly, confusing, and unorganized, and its difficult to see where items start and stop. So I suggest installing the "XML Tools" plugin via Notepad++s Plugin Manager.
Once its installed, find it in the plugins menu and choose the menu option "Pretty Print - XML only". Now the code looks organized and clear.
Step 1.2 - Clear the junk out
OK, now lets start the editing. Remember, for everything we do, ensure in the Find/Replace box that "Regular expressions" and the ".matches new line" boxes are checked.
[We just want your entry data, but the Blogger archive file includes lots of information about your settings, template, etc. It seems to store this data as unused blog "entries." Since we dont need that data and only want the post content, lets remove it all.]
Use the "Find..." command to find this in the file:
BLOG_USE_LIGHTBOX
Youll probably get back three results, but all are in the same blog entry (i.e. between a pair of <entry> and </entry> tags). Which ever entry includes this BLOG_USE_LIGHTBOX code will be the final of the useless unused blog entires, meaning your real, first, actual blog post starts after this entry.
So with your eyes, look down a few lines from the last BLOG_USE_LIGHTBOX and find the first <entry> tag just below. Overall, for my test archive file, this was around line 3460.There are lots of <entry> tags so make sure youve got the right one.
Now delete EVERYTHING above that <entry> tag.
Youll be left with just actual blog posts.
Step 1.3 - Delete useless blog post info
Now we start using Find/Replace to remove the bits that are useless to us. So use the Find/Replace function to Find each of these items (yes, one at a time because Im not a programmer) and Replace them with nothing (leave the Replace box blank).
These functions will find these tags and all the content between them, and remove it. Just paste each of these one at a time into the "Find" box, make sure the "Replace" box is empty, and hit the "Replace All" button. Repeat for each:
- <id.*?</id>
- <author.*?</author>
- <updated.*?/>
- <media.*?/>
- <category.*?/>
- </title>
- <link rel=edit.*?/>
- <link rel=self.*?/>
- <link rel=replies.*?/>
- <thr.*?/thr:total>
- <thr:in-reply-to.*?/>
- <gd:extendedProperty.*?/>
Note: depending on your blog some of these items might not be found anyway. No problem.
Part 2 - Start replacing the tags
Step 2.1 - Location data
You should decide if you want your blog posts geotag/location data kept and used as the location for the calendar events.
Step 2.1 A - Preserve it!
If you want this preserved, do this:
Find:
<georss:featurename>
and Replace it with:
LOCATION:
and
Find:
</georss:featurename>.*?</georss:box>
and Replace it with nothing (i.e. leave the box blank)
Step 2.1 B - Remove it
If you do not want locations included, or if you never geotaggeg your blog posts, then you can remove all location info with this:
Find:
<georss:featurename>.*?</georss:box>
and Replace it with nothing
Step 2.2 - Replace Blogger tags with iCal-friendly tags
Now its time to do some replacement. Just Find/Replace these sets:
Find:
<published>
and Replace it with:
DTSTART;VALUE=DATE:
This will set the blog post in the calendar to the date of the blog post. I opted to ensure the set blog post is used (whether you published it then or had manually back/forward dated it) instead of the date the entry was last updated.
Find:
<entry>
and Replace with:
BEGIN:VEVENT
and
Find:
</entry>
and Replace with:
END:VEVENT
These will make each blog post its own event.
Find:
</feed>
and Replace with:
END:VCALENDAR
This will mark the end of the blog archive as the end of your imported calendar data.
Step 2.3 - Blog post title as event title
Find:
<title type=text>
and Replace with:
SUMMARY:
That will set the blog posts title as the event title. So what you see as the entry on your calendar will be this. You dont have to do this, of course. Were starting to get into the "what you feel like" part.
Optionally, you could add some sort of prefix here if you wanted. For example, instead of Replace with just SUMMARY you could use SUMMARY: Blog Post- so your calendar entry can be more visually distinguished from other normal calendar events.
Part 3 - Calendar entry content
Step 3.1 - Choose the entry content: Link or post?
Now we need to decide whats going to go in the event description.
- Do you want the entire posts content in there, so that you can read the whole post in your calendar?
- Or do you want just a link to your original blog post?
Step 3.1 A - Blog content as Description
If you want the entirety of each posts content copied to each corresponding calendar entry, do this:
Find:
<content type=html>
and Replace with:
DESCRIPTION:
Then Find the following items, Replacing each with nothing (i.e. leave the box blank):
- </content>
- <link rel=alternate type=text/html href=
- title=.*?/>
This will leave a link to the original post at the end of the entry. If your post was a draft in Blogger, it wont have a link because it was never published.
Step 3.1 B - A link back to original post as Description
If you just want a link back to the original post in your calendar event:
Find:
<link rel=alternate type=text/html href=
and Replace with:
DESCRIPTION:
Then Find the following items, Replacing each with nothing (i.e. leave the box blank):
- title=.*?/>
- <content type=.*?</content>
Part 4 - Clean Up
Now we need to clean up the number formats to make the Blogger timestamps fit well with a Calendar app.
One problem is that your blog archive file has whatever time zone setting your blog had. So the publish times are going to be off. But I dont really care about accurate hours, just accurate dates. So Im going to make my life more simple and just remove the timestamps.
Optional: You could edit this to keep the timestamps and, for example, just change the time zone marker to "Z" so it thinks the posting time was in GMT. Thats easiest. Then youd have to remove the colons separating the hour:minute:seconds. And go back and remove ";VALUE=DATE"
But I just want the dates (this will make the blog post an "all day" event on your calendar).
So lets remove the time-stamps and clean up the date-stamps. But before that, decide:
Step 4.1 - Repeat or Not?
Decide if you just want your blog posts exported to the calendar, on just the dates when they were posted, or if you want them to repeat annually. I like that whole "time hop" on "On this Day" feeling, so I perfer to have them repeat annually.
Step 4.1 A - No repeat
For no repeat, and just a proper archive, then run this Find/Replace task:
Find:
T(d+):.*?</published>
and Replace with nothing (i.e. leave the box blank)
Step 4.1 B - Repeat annually
But if, like me, you like the whole "time hop" reminder, and would like to see each post repeat on its same day each year, run this task instead. I highly recommend this, as its a great way to revisit your content.
Find:
T(d+):.*?</published>
and Replace with:
RRULE_FREQ=YEARLY
and
Find:
(d+)-(d+)-(d+)
and Replace it with:
$1$2$3
This will remove the hyphens from the date format Blogger uses. We need just a pure series of numbers. Hat tip to http://stackoverflow.com/a/25627871
Step 4.2 - Header
Congrats, were almost done. Now just manually go add this to the very top of the page:
BEGIN:VCALENDARPRODID:<Test>VERSION:2.0
Finally, its time to clean out any messy tab spaces that are left over. Its important that each item be at the start of a new line. There can be extra blank lines between, but everything needs to be far-left as possible. So lets remove any errant tab spaces:
Find:
+
and Replace with nothing (i.e. leave the box blank)
Step 4.3 - Save
Now just save the file as plain text ("Normal Text File" in the drop-down menu in the Save dialog box).
Before saving, rename the extension from .txt to .ics
Step 4.4 - Import file
You can now import the file into your Google Calendar or whatever calendar app.
Final Thoughts
Play around with this and find the best method that works for you. For example you might want to better format the post content if you chose to show the whole original post inside the calendar. Links in your original blog post will stay in the calendar event description (if you chose to keep the full content) but images of course wont display (links to the images will be there though).
Just have fun and I hope you found this helpful. Im no programmer, but just spent a few hours playing around with this. Its a good way to resurface old memories, and gives another back-up option besides just your hosted blog, or a dead archive file sitting on your hard drive.
Good luck and enjoy revisiting all those old blog memories.
=====
This post is from the blog 10? Tips, by Sam Nordberg. See the original there, and follow me on Facebook or Twitter @10wontips.
This post is from the blog 10? Tips, by Sam Nordberg. See the original there, and follow me on Facebook or Twitter @10wontips.