Whitelist MediaWiki Namespaces with $wgWhitelistRead

MediaWiki is designed for the most part to be an open document repository. In most setups (presumably), everyone can read and only registered users can edit. However, permissions can’t get much more granular than this. For my project at least, I would like to not just limit anonymous users from editing, I would like to selectively limit them from reading certain things.

I looked around for quite some time until I came upon a variable you can set in your LocalSettings.php file: $wgWhitelistRead. Basically, this variable whitelists the pages specified in the array. The downside to this is you can’t use wildcards or namespaces/categories. You must specify a single page per array value. This doesn’t quite cut it for my needs. That being said, here’s my solution (albeit rough).

The end goal here looks like this…

  • All users are blocked from reading and writing all pages

  • Users in all groups are then given read access to the whitelisted namespaces

  • Finally, users in the specified groups have read and write access to all pages (save for the administration/sysop pages of course).

Limiting All Access

To do this, in your LocalSettings.php file, place the following four lines…

$wgGroupPermissions['*']['read'] = false;
$wgGroupPermissions['*']['edit'] = false;
$wgGroupPermissions['user']['read'] = false;
$wgGroupPermissions['user']['edit'] = false;

Granting Sysop Access

Once you have the lines in the last section in your config file, your entire wiki should be unavailable, even to sysop people (they are users after all). To give access back to your sysop folk, place the following two lines in your LocalSettings.php file

$wgGroupPermissions['sysop']['read'] = true;
$wgGroupPermissions['sysop']['edit'] = true;

This will only grant access to your sysop authenticated users. If they’re not already authenticated, they still can’t get to the Special:UserLogin form (we’ll get to that in just a few) to login. They may be sysops at heart, but hearts don’t authenticate people without usernames and passwords.

Granting Individual Group Access

Now that our sysops have permissions, next we need a custom group so we can grant permissions to them. We’ll call that group GreenTea (yes, I’m drinking some green tea right now). To do that, let’s throw another few lines in the LocalSettings.php file…

$wgGroupPermissions['greentea'] =
$wgGroupPermissions['user']; $wgGroupPermissions['greentea']['read'] =
true; $wgGroupPermissions['greentea']['edit'] = true;

Granting Minimal Global Permissions

Now that our group is set up, we need to whitelist the necessary and wanted pages for anonymous folk to log in and/or do their thing depending on what groups they are in. To do this, let’s add yet another few lines to our LocalSettings.php file

$wgWhitelistRead = array(
  'Main Page',
  'Special:Userlogin',
  'Special:UserLogout',
);

What we just did was whitelist the main page, the login page, and the logout page. This allows users to get in and out of your wiki, whether or not their permissions allow them access to anything. At this point, you can log in with your sysop user and put people into our previously created greentea group. Once that’s done, the greentea users should have full access to the entire wiki.

I would like to note here that that this point, users outside of the greentea group will have the same permissions as anonymous/unauthenticated users. They cannot read or edit any pages other than the ones currently whitelisted.

Editing MediaWiki to Whitelist Namespaces

This is the only part that’s out of the ordinary here. We are going to edit actual MediaWiki code. The big downside to doing this is that if the MediaWiki instance is upgrade, it is highly likely that the changes made in this section will be overwritten. Thankfully though, the changes are very simple, so making them again shouldn’t be a problem. They’re so simple in fact, I think the MediaWiki folks might actually accept my code into their branch.

To set up our MediaWiki instance so it handles regex whitelist statements, we need to edit the Title.php file in the includes directory.

Firstly, we need to comment out the code that processes the whitelist variable. Head to around line 1870 in Title.php and comment out just the following lines

//Check with and without underscores
if ( in_array( $name, $wgWhitelistRead, true ) || in_array( $dbName, $wgWhitelistRead, true ) )
  return true;

Now that those have been commented out, we need to add in the code that will process regex statements in the whitelist array. Below the lines you just commented out, add the following code…

foreach ( $wgWhitelistRead as $item )
  if ( preg_match( '/^'.$item.'$/', $name )
  || preg_match( '/^'.$dbName.'$/', $name ) ) return true;

Usage

To use the changes we just put in place, all that needs to be done is edit the $wgWhitelistRead variable in LocalSettings.php again.

Say, for example, that we have a HowTo namespace (HowTo:Drink Green Tea for example) that we want everyone to be able to read that isn’t in the greentea group (they have to learn somehow after all). All that needs to be done is a little regex…

$wgWhitelistRead = array(
  'Main Page',
  'Special:Userlogin',
  'Special:UserLogout',
  'HowTo:.*',
);

That just whitelisted all pages inside the HowTo namespace.

A Bad Explanation Attempt

In case anyone who doesn’t know is wondering why you put a .* at the end of the HowTo namespace, here you go.

In regular expressions, various symbols have different meanings. In this case, the period signifies any case letter, number, symbol, etc. That means that HowTo:. would match anything like HowTo:A, HowTo:3, HowTo:-, etc. It would however not match HowTo:A123. Why? The period in regular expressions matches only one character. What we need is to say match any character any number of times after HowTo:. For that we’ll need the asterisk.

The asterisk in regular expressions is what we call a quantifier. It doesn’t represent a character so much as a quantity. In non regex terms, an asterisk means that the previous character in the regex string can be repeated zero or more times and still match. That means that the regular expression c* would match nothing, c, cccc, cccccc, etc. It would however not match for example, b, 5, 12345a, etc. In our example, HowTo:.*, the period represents any character and it is followed by an asterisk, so that means that any article that starts with HowTo: will match, no matter what the ending, even if it doesn’t have one.

Hopefully someone finds this post useful. If anyone has questions about .* please ask them in the comments.