There may be times when you need to extract just the text from a glob of HTML copied from the source as the content couldn't be copied or the text on the web page was hidden. Recently, I wanted to get the subtitles of a YouTube video, but it wasn't easy to copy it from the transcript. I couldn't also locate the timedtext file that contains the subtitles so I had to point at the Transcript block using Developer Tools (F12 keyboard shortcut) and get the HTML.
Here's the trick I tried -
Now that I had the text in HTML format, I copied it to Excel, selected Ctrl+H to invoke the Replace dialog box and in the Find What textbox I typed <*> and hit the Replace All button after leaving the Replace With textbox blank. That removed all the tags alongwith its attributes and left just the text.
Also see -
HOW TO strip HTML tags and show just web page text programmatically and with EditPlus
Thursday, 12 July 2012
HOW TO convert HTML content to plain text - with Excel!
Posted on 20:14 by Unknown
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment