Skip to main content

Regular Expressions: Using Perl to Convert HTML to Latex

Hands-On Lab

 

Photo of Elle Krout

Elle Krout

Content Team Lead in Content

Length

01:00:00

Difficulty

Intermediate

Perl and regex go so well together one of the most popular and common regex types is called Perl-Compatible Regular Expressions. In this hands-on lab, we're going to write a reusable script that can convert HTML documents to Latex using Perl and Perl-based regular expressions.

What are Hands-On Labs?

Hands-On Labs are scenario-based learning environments where learners can practice without consequences. Don't compromise a system or waste money on expensive downloads. Practice real-world skills without the real-world risk, no assembly required.

Regular Expressions: Using Perl to Convert HTML to Latex

Introduction

Perl and regex go so well together one of the most popular and common regex types is called Perl-Compatible Regular Expressions. In this learning activity, we're going to write a reusable script that can convert HTML documents to Latex using Perl and Perl-based regular expressions.

Solution

Begin by logging in to the lab server using the credentials provided on the hands-on lab page:

ssh cloud_user@PUBLIC_IP_ADDRESS

It may be helpful to have two terminal windows open for this lab; one to write the script and the other to test.

Craft a script to convert HTML to Latex

Using Perl, create a script that will convert any HTML documentation to Latex using the standards provided in the instructions.

Create the script:

vim html-to-latex.pl

Include the following:

#! /user/bin/perl

print "\documentclass{article}n\usepackage{hyperref}n\begin{document}n";

while (<>) {
  s/<[Hh]1 [^>]*>(.+?)</[Hh]1>/\title{1}n/;
  s/<[Hh]2 [^>]*>(.+?)</[Hh]2>/\section{1}nn/;
  s/<[Hh]3 [^>]*>(.+?)</[Hh]3>/\subsection{1}nn/;
  s/<ul>/\begin{itemize}n/;
  s/</ul>/\end{itemize}nn/;
  s/<li>/\item /;
  s/</li>//;
  s/<code>(.+?)</code>/texttt{1}/g;
  s/<a href="(.+?)">(.+?)</a>/\href{1}{2}/g;
  s/<em>(.+?)</em>/\textit{1}/g;
  if (/+ <p>/) {
    s/<p>//g;
    s/</?p>//g;
  }
  s/<p>(.+?)</p>/\begin{verbatim}n1n\end{verbatim}n/;
  s/nn/n/;
  print;
}

print "\end{document}n";
close;

Convert regexcs.html to Latex

Convert the provided HTML file to Latex; save as regexcs.latex.

perl -f html-to-latex.pl regexcs.html > regexcs.latex

Conclusion

Congratulations — you've completed this hands-on lab!