Currently almost the entire Lenin Archive needs to repaired and cleaned up due to extremely poor HTML layout and mouseover code. In order to do this we've prepared this todo list and a list of all works that have yet to be cleaned, wrote a script to automate some of the cleanup process and have cleaned up Volume One of the Collected Works to be used as a reference so that volunteers can have guide on how to help with the cleanup effort.
To-Do Index
Things you'll need to setup for cleanup:
Cleanup Script
Enter the URL of the page in the tool above and click Fetch & Clean. This will fetch the HTML and run the cleanup script automatically. The cleaned HTML appears in the left pane and renders in the right pane. Use Copy HTML to copy it to your clipboard.
Paragraphs
Paragraphs should all be one line (use Ctrl + J on Notepad++). This also goes for any other element in general but the problem is most prevalent with paragraphs.
Like this:
<p>Text for illustration purposes. Text for illustration purposes.</p> |
Instead of this:
<p> Text for illustration purposes. Text for illustration purposes. </p> |
Table and List Cleanup
Tables and Lists need to be cleaned up to be more readable. We'll be following the W3 standard of 2 spaces for indentation for this.
<table> <tr> <td>Example Text</td> </tr> </table> |
V.I Lenin Header
The V.I. Lenin header needs to have its the links to the next work/chapter removed and be changed from "V.I Lenin" to "Vladimir Ilyich Lenin"
| From this: | <h2> <a title="..." href="...">V. I.</a> <a title="..." href="...">Lenin</a></h2> |
| To this: | <h2>Vladimir Ilyich Lenin</h2> |
Replace footer tables with the following format. If a work is not part of a multi-chapter work then remove the first table before the <hr> tag.
<table class="footer"> <tr> <td class="footer-backward"><a href="template.htm">Previous Chapter</a></td> <td>|</td> <td class="footer-forward"><a href="template.htm">Next Chapter</a></td> </tr> <tr> <td colspan="3"><a href="template.htm">Document Index</a></td> </tr> </table> <hr class="end"> <table class="footer"> <tr> <td class="footer-backward" colspan="3"><a href="template.htm">< Backwards</a></td> <td></td> <td class="footer-forward" colspan="3"><a href="template.htm">Forwards ></a></td> </tr> <tr> <td><a href="../../index.htm">Works Index</a></td> <td>|</td> <td><a href="../../cw/template.htm">Current Volume</a></td> <td>|</td> <td><a href="../../cw/index.htm">Collected Works</a></td> <td>|</td> <td><a href="../../../index.htm">L.I.A. Index</a></td> </tr> </table> |
GUESS Links
The GUESS portion of links in indexes need to be removed.
| From this: | <a href="i8i.htm#v02zz99h-135-GUESS">The Economic Theories...</a> |
| To this: | <a href="i8i.htm">The Economic Theories...</a> |
PLACEHOLDER Footnotes
You'll occasionally find footnotes that only read as [PLACEHOLDER FOOTNOTE]. These need to be filled in using the <a> tag's id attribute.
id="#bkV01P001F01" — V01 = Volume, P001 = page, F01 = footnote number id="#bkV01E001" — V01 = Volume, E001 = endnote number |
Automated Sections
This section contains all the cleanup that is usually done by the script. Included in case they were missed.
HTML5 Conversion (partially automated)
Convert the DOCTYPE and html tag:
| From: | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "..."> <html xmlns="http://www.w3.org/1999/xhtml"> |
| To: | <!DOCTYPE html> <html lang="en"> |
Self-closing tags no longer need a / to close themselves.
| From: | <br /> <hr /> <hr class="section" /> |
| To: | <br> <hr> <hr class="section"> |
The name attribute is obsolete — remove it from anchor tags, keeping only id.
| From: | <a name="ft" id="ft"> |
| To: | <a id="ft"> |
Mouseover Code (automated)
Remove <a> tags with onmouseover/onmouseout attributes:
<a onmouseover="window.status='...'" onmouseout="window.status=''">This</a> |
Remove self-closing anchor tags that look like this:
<a name='v04pp64h:14' /> <a name='v04pp64h:14'> |
Comments (automated)
Remove comments like the following:
<!-- Emacs-File-stamp: "~/Lia/archive/..." -->
<!-- t2h-body -->
<!-- vol=04 pg=013 src=v04pp64h type= -->
|
Footnotes (automated)
Simplify footnote markup: change class="ednote"/"anote" to "endnote", move the class to the <sup> element, and remove the obsolete name attribute.
| From: | <sup><a class="anote" id="bkV01E001" name="bkV01E001" href="#fwV01E001">[1]</a></sup> |
| To: | <sup class="endnote"><a id="bkV01E001" href="#fwV01E001">[1]</a></sup> |
Entities (automated)
Replace entities with their actual counterpart:
|
.txt Links (automated)
In the infoblock, remove .txt links found in the Source info (usually around the year).
<a href="../../cw/v13pp72.txt">1972</a> → 1972 |
README Links (automated)
Remove "• README" links in the infoblock that go to a dead end.
class="title" Attributes (automated)
Remove class="title" from heading elements — they do nothing.
| From: | <h3 class="title">Title text</h3> |
| To: | <h3>Title text</h3> |
When You're Finished
Run your HTML through the Nu HTML Checker for any mistakes and HTML5 conversions that were missed.
If you so wish, add the following to the info header of the work with you credited:
Refactored by: [Your Name/Handle]
Send an email with the cleaned up HTML file and a link to the work to the current admin of the LIA and inform them of your work.