True story
The simplest reason to separate newer content from older content is that your database is getting to big. A few years ago I was involved with building a huge financial application that was based on an pre-existing database. It kept a variety of fiscal data, and it already had several years worth of data in it, when I first became involved with the project. Having a single database store years of data only makes sense when there you actually need all this data, at any given time. The rest of the team said that was the case, so I didn't think much of it.
A few weeks into the team I realised that the reality was a little bit different. Our users only needed to access the current fiscal year's data, except:
- When producing yearly reports, something that happened once a year
- When producing three year reports, something that happened once every three years
We decided to split the database into different databases, per year. It wasn't an easy decision but it worked: The current fiscal year database had lightning fast responses, as it was only scanning through a very small subset of the whole data. The yearly reports where also generated on the fly, for the same reason. The three year reports were a little bit slower to generate than before and it took a lot of creativity to combine three archives, but that was a very small disadvantage of the process.
So our database was split into small archive databases, and everyone was happy. (Not really, lot of rough edges and of course this is an oversimplified version of the story, but overall the decision to archive was a good one).
Heterogeneous data
Another reason to archive is when you have lack of data homogeneity over time. When the schema of the database changes, especially for huge databases, your best bet is a document-oriented database like MongoDB. Document-oriented databases have flexible schemas, which means that they don't care if you don't use the same fields to describe a record, hence you don't have empty fields as you would with a relational database.
And as Jeff O correctly notes in a comment to another answer, archived data by definition won't change, so you don't need to care about transactions and other relational functionality. (Added here, in case comment or answer goes AWOL)
Archiving on news oriented websites
News oriented websites with lots of data may opt in to archive their older content into a document-oriented database, because since they deal in news, their fresher content is a lot more valuable to them from a business point of view.
SEO & pagination
Lastly, it has nothing to do with SEO and / or pagination (Pagination will work regardless of where your content is stored). A bot will scan through your paginated content, following the pagination links like any other link. If you adopt a sensible URI schema for all your content, you have nothing to worry about. For example, imagine you have a blog with ten years of articles and you've decided to move all articles that are older than 2010-12-31 to a document storage archive.
Your homepage would probably have a list of newest articles, something like
http://example.com/articles/2011-11-1/title
http://example.com/articles/2011-10-30/title
http://example.com/articles/2011-10-20/title
Going through the pages you finally stumble upon a page:
http://example.com/articles/2011-1-1/title
http://example.com/articles/2010-12-30/title
http://example.com/articles/2010-12-25/title
Same URI schema, regardless of whether the article is stored on your current database or your archive database. All you have to do is a simple server side check when your visitor (human or bot) clicks on the 2010-12-30 article:
if(date <= 2010-12-31) {
// get article information from archive
} else {
// get article information from current database
}
Now why some sites may choose to move archived content into a special archive section, is something that's answerable only by those who built them. There may be a few user experience factors involved, but that's off topic for Programmers, you can try inquiring the folks over at User Experience Stack Exchange.