It also allows quite the reverse, which is, it extracts just the comments and filter out the code. This can be useful for counting lines of comments, and for locating spelling/grammar errors in comments by using a spell checker or grammar checker on the output.
Finally, it identifies and extracts copyright statements from code based on the defacto-coding-style. All comments in a file that preceed any line of code or a blank line are assumed to be copyright statements. While this is not true in general, many commercial source code follow that convention.
This script supports a few programming languages including Java, Java-IDL, C, C++, JavaScript, HTML, Shell Scripts, Perl, AWK, sed, Makefile, etc. It identifies the file type based on the file extension (as documented below), and allows overriding that behavior by specification of a command line parameter useful when providing the input via stdin.
The usage is best commented
in the source code itself. The actual code is not written here
since it is not html-formatted yet, but you can download it
here.
# eXtract Source Code Comment
#
# Usage: xscc.awk [extract=code|comment|copyright] [prune=copyright]
# [blanklines=1] [language=<lang>] file ...
#
# Note: If your shell environment does not have /usr/bin/awk available
# you might have to run this command by typing:
# awk -f xscc.awk [extract=code|comment|copyright] [prune=copyright]
# [blanklines=1] [language=<lang>] file ...
# Certain old versions of awk may not support this script. If the awk
# on your system gives errors, consider using nawk or gawk.
#
# This AWK script extracts program source code, comments or copyright
# statements. Copyright statements are defined as the comment lines that
# preceed the first line of code.
#
# The default behavior is to extract the source code, and filter the
# comments out. The optional arguments are described below:
#
# extract=code -- print the code, filter comments out. This mode
# is the default, unless overridden otherwise.
# extract=comment -- print the comments, and filter out the code.
# extract=copyright -- print the copyright statements only.
#
# prune=copyright -- in the default mode (extract=code), it prints
# all code and comments following the copyright
# statements, which are filtered out.
# In the 'extract=comment' mode, it prints all
# comments other than the copyright statements.
#
# blanklines=1 -- by default, blank lines are not printed, unless
# specified using this option.
#
# language=<lang> -- force a specific language as per the following
# table, rather than infer the language from the
# extension, which is the default behavior.
#
# This script supports the following programming languages, and infers the
# language from the file extension (unless overridded using language=<lang>)
# as follows:
#
# Language Extensions
# Java java, idl
# C c
# C++ C, cc, cpp, h, H
# JavaScript js
# HTML htm, html
# Shell sh, ksh, bash, ksh
# Perl pl, perl, pm
#