4

I am not sure if this is the best place to ask this question, but I was looking at a Wikipedia article and noticed there were a lot of edits for that article. Since you can view each edited iteration of the article, I figured the amount of space those web pages take would add up. There are millions of articles and Wikipedia doesn't seem to get rid of vandalized edits. Alternatively, I was thinking that Wikipedia could keep the base article and use the edit history as instructions for how to edit the base article.

My question is, as a person with a limited background in programming, how does a site like Wikipedia store previous edits of pages? Do they store each page, are each edit instructions to modify a base article, or is there some other concept used?

Or, I guess, is the data storage used so minimal, that this isn't even an issue?

  • see [Why we're not customer support for \[your favorite company\]](https://meta.stackoverflow.com/q/255745/839601) – gnat Jul 25 '18 at 18:48
  • yes. yes they do – Ewan Jul 25 '18 at 18:56
  • 8
    I'm voting to close this question as off-topic because this an implementation detail of a specific website, unrelated to software development in general. – Martin Maat Jul 25 '18 at 19:30
  • 2
    its not off topic, just imagine the title is more generic, suffixed with ' for example wikipedia' – Ewan Jul 26 '18 at 08:23

2 Answers2

5

The software that runs Wikipedia is called MediaWiki, and you can see in its documentation that it indeed keeps the full text "wikitext" of each revision, though it may be compressed or stored in a separate database.

Alternatively, I was thinking that Wikipedia could keep the base article and use the edit history as instructions for how to edit the base article.

This is possible but would be unneccessarily complicated, since:

Or, I guess, is the data storage used so minimal, that this isn't even an issue?

Yep, you nailed it. This is text, it really doesn't take much space at all compared to images or videos, and nowadays storage space is ridiculously abundant and cheap.

Michael Borgwardt
  • 51,037
  • 13
  • 124
  • 176
0

Some of it is stored as plain text in the database, some of it is stored in a database in concatenated gzip format, or as compressed diffs.

Tgr
  • 111
  • 5