Tech Blog :: Drupal: Re-Sync Content Taxonomy from core taxonomy


Jan 21 '11 1:19pm

Drupal: Re-Sync Content Taxonomy from core taxonomy

I'm working on a major Drupal (6) site migration now, and one of the components that needs to be automatically migrated is taxonomies, mapping old vocabularies and terms to new ones. Core taxonomy stores node terms in the term_node table, which is fairly easy to work with. However, two taxonomy supplements are making the process a little more complicated:

First, we're using Content Taxonomy to generate prettier/smarter taxonomy fields on node forms. The module allows partial vocabularies to be displayed, in nicer formats than the standard multi-select field. The problem with Content Taxonomy for the migration, however, is that it duplicates the term_node links into its own CCK tables. If a node is mapped to a term in term_node, but not in the Content Taxonomy table, when the taxonomy list appears on the node form, the link isn't set.

Ideally, the module would have the ability to re-sync from node_term built in. There's an issue thread related to this - Keep core taxonomy & CCK taxonomy synced - but it's not resolved.

So I wrote a Drush command to do this. To run it, rename "MODULE" to your custom module, backup your database, read the code so you understand what it does, and run drush sync-content-taxonomy --verbose.

Warning: This code only works properly on *shared* CCK fields, that is, fields with their own tables (content_field_X tables, not a common content_type_Y table). Don't use this if your fields are only used by one content type.

[Embedded Gist - if it's not showing up, click here.]

The other taxonomy supplement that needs to be migrated is Primary Term. I'll be writing a similar Drush script for this in the next few days.

Update 1/27: There was a bug in the way it cleared the tables before rebuilding, should be good now. (Make sure to download the latest Gist.)

Thanks for sharing this!
I tried the migrating on a testsite, terms were inserted into cck fields ok, BUT: all other cck fields in the node type were emptied :)
So this kind of wiped my site... Any idea what could cause this?

Uh oh. First, I hoped you backed up your database as the post advises.
Second, I can see what happened, your CCK fields were probably only used by 1 content type; my code assumes (incorrectly for you) that they're shared. So it deletes the rows in the field's table, which in a shared field's case only applies to that field, but in your case applied to the whole content type. Good to know; I'll put a note warning people about that. You're welcome to modify the code to work in both cases: it would have to check if the field is shared, and do a set x_field_value=NULL instead of DELETE FROM {table}.

Yes, my database was safe when trying this. Thanks for the additional information. I will just make shure that all fields are shared fields and try again.

Post new comment

Don't bother putting in spam links. They'll be set to rel=nofollow and will be removed and reported as spam shortly after submitting.

The content of this field is kept private and will not be shown publicly.
CAPTCHA
Are you human?