eXtract Source Code Comment

[Download] [Unix Scripts] [Send Feedback]

eXtract Source Code Comment

[Download]

This is an AWK script, inspired by the application Source Lines of Code and Comment Count (slocc.sh). It extracts source code from a file, filtering all the comments out, which make it useful for subsequent processing by other scripts that may want to process just the source. Such applications may include ones that simply count the total lines of code, those that count number of literals for complexity, a simple grep for occurrence of a program literal in the code, or perhaps a tag generator or dependency analyzer.

It also allows quite the reverse, which is, it extracts just the comments and filter out the code. This can be useful for counting lines of comments, and for locating spelling/grammar errors in comments by using a spell checker or grammar checker on the output.

Finally, it identifies and extracts copyright statements from code based on the defacto-coding-style. All comments in a file that preceed any line of code or a blank line are assumed to be copyright statements. While this is not true in general, many commercial source code follow that convention.

This script supports a few programming languages including Java, Java-IDL, C, C++, JavaScript, HTML, Shell Scripts, Perl, AWK, sed, Makefile, etc. It identifies the file type based on the file extension (as documented below), and allows overriding that behavior by specification of a command line parameter useful when providing the input via stdin.

The usage is best commented in the source code itself. The actual code is not written here since it is not html-formatted yet, but you can download it here.

# eXtract Source Code Comment
#
# Usage: xscc.awk [extract=code|comment|copyright] [prune=copyright]
#                 [blanklines=1] [language=<lang>] file ...
#
# Note:  If your shell environment does not have /usr/bin/awk available
#        you might have to run this command by typing:
# awk -f xscc.awk [extract=code|comment|copyright] [prune=copyright]
#                 [blanklines=1] [language=<lang>] file ...
# Certain old versions of awk may not support this script. If the awk
# on your system gives errors, consider using nawk or gawk.
#
# This AWK script extracts program source code, comments or copyright 
# statements. Copyright statements are defined as the comment lines that
# preceed the first line of code.
#
# The default behavior is to extract the source code, and filter the
# comments out. The optional arguments are described below:
#
#    extract=code      -- print the code, filter comments out. This mode
#                         is the default, unless overridden otherwise.
#    extract=comment   -- print the comments, and filter out the code. 
#    extract=copyright -- print the copyright statements only.
#
#    prune=copyright   -- in the default mode (extract=code), it prints
#                         all code and comments following the copyright 
#                         statements, which are filtered out. 
#                         In the 'extract=comment' mode, it prints all
#                         comments other than the copyright statements.
# 
#    blanklines=1      -- by default, blank lines are not printed, unless 
#                         specified using this option.
#    
#    language=<lang>   -- force a specific language as per the following
#                         table, rather than infer the language from the
#                         extension, which is the default behavior.
#
# This script supports the following programming languages, and infers the 
# language from the file extension (unless overridded using language=<lang>)
# as follows:
#
# Language         Extensions
# Java             java, idl
# C                c
# C++              C, cc, cpp, h, H
# JavaScript       js
# HTML             htm, html
# Shell            sh, ksh, bash, ksh
# Perl             pl, perl, pm
#

[Download]

At this time, I have packed the script using another AWK script that I wrote for the purpose. You can use this packed version for FREE as per the terms set forth here. You can also purchase the original source code with all the comments for $10 by clicking here.