User talk:Flcelloguy/Tool

Latest comment: 18 years ago by Jnothman in topic Python version

*sulk* *whine* Why does Kate's tool have to be down? Anyway, you may want to mention that the user needs to copy-n-paste from a specific URL, like this: http://en.wikipedia.org/w/index.php?title=Special:Contributions&target=Flcelloguy&offset=0&limit=5000. Also, for those who might have a favorite editor, I've added a section below for that. --Interiot 00:53, 5 December 2005 (UTC)Reply


Equivalent commands in your favorite editor/operating system edit

Vim edit

  • hit "p" to paste the clipboard into the editor
  • type ":g!/(hist)/d" and hit enter, to remove all non-history lines
  • type Control-G, and the total number of lines should be displayed at the bottom of your screen

MS-DOS edit

  • find /c "(hist)" filename

Unix edit

  • grep -c '(hist)' filename

Quick and dirty edit

You can do a count by copy pasting the contributions list into Microsoft Word, doing select all and formatting as a numbered list. Can probably do something similar in other word processors. the wub "?!" 00:01, 6 December 2005 (UTC)Reply

True, right now the "tool" is at a crude stage where all it can do is count edits by parsing through them and incrementing a variable. However, I plan to include statistics soon — i.e. breakdown by namespace, percent of minor edits, percent of edit summaries, etc. This is the basic framework for future versions. Thanks! Flcelloguy (A note?) 01:07, 6 December 2005 (UTC)Reply
In terms of breaking out the specific statistics, I'm doing that with my tool [1], but it turns out that it's difficult to do in a language-agnostic way... that is, it's hard to differentiate between the edit summaries "Category Talk:", "Kategorie Diskussion:", and "Please: stop reverting!" and automatically realize the latter is in the main namespace. But somebody on IRC mentioned that Kate's tool won't return for a couple more weeks :(, so I guess it's good to have some alternatives. --Interiot 03:17, 6 December 2005 (UTC)Reply
I'm working on a more sophisticated extension to the tool at User:Titoxd/Flcelloguy's Tool, which will be able to parse correctly the name of pages, namespaces, minor/major edits, edit summaries and recent edits from the HTML of the Special:Contributions page, with no need of cut-and-pasting. It's still on its early stages, though. Titoxd(?!? - did you read this?) 20:53, 8 December 2005 (UTC)Reply

Python version edit

I find your code and its repetition of if statements, etc, to be very redundant and leave a lot for improvement in terms of size. You shouldn't need separate variable names for each count. But I couldn't be bothered working with Java data structures, so here is essentially the same in Python using standard input/output:

from sys import stdin
from re import compile as compre
from sets import Set

CONTR_RE = compre('\(diff\) (m?) ?([^:]*:|.*)')

namespaces = [
    '', 'Talk',
    'User', 'User talk',
    'Category', 'Category talk',
    'Image', 'Image talk',
    'MediaWiki', 'MediaWiki talk',
    'Template', 'Template talk',
    'Wikipedia', 'Wikipedia talk'
]

counts = {}
for ns in namespaces:
    counts[('', ns)] = 0
    counts[('m', ns)] = 0

ns_set = Set(namespaces)
for line in stdin:
    match = CONTR_RE.search(line)
    (minor, ns) = match.groups()
    if ns[:-1] in ns_set:
        ns = ns[:-1]
    else:
        ns = ''
    counts[(minor, ns)] += 1

def print_row(title, major, minor, tot=None):
    if not tot:
        tot = str(major + minor)
    print '%s%s\t%s\t%s\t%s' % (
        title, ' '*(16-len(title)),
        str(major), str(minor), tot
    )


print_row('Namespace', 'MAJ', 'MIN', 'TOT')
print_row('---------', '---', '---', '---')

counts[''] = 0
counts['m'] = 0
for ns in namespaces:
    ns_name = ns
    if not ns_name:
        ns_name = "main"
    print_row(ns_name, counts[('', ns)], counts[('m', ns)])
    counts[''] += counts[('', ns)]
    counts['m'] += counts[('m', ns)]

print_row('TOTAL', counts[''], counts['m'])

jnothman talk 09:06, 5 January 2006 (UTC)Reply