Sitemap Generator
Posted: Wed Feb 27, 2008 5:04 pm
Hello! I'm planning to create a new component that would automatically create sitemaps for Joomla 1.5-based websites and would like some guidance on how to go about it. I've looked through the current extensions available but unfortunately, none have been able to give me what I'd want to get out them. That is why I'd like to make one from scratch and would like to go about it correctly.
I'm a beginner when it comes to SQL and PHP but I learn pretty quickly and have a ton of free time which is why I've decided to do this myself instead of waiting for someone else to make a new component.
What I'd like to do is to generate the sitemap.xml and urllist.txt files that are used by Google Sitemaps and Yahoo! Site Explorer that help websites get their content onto search engines. It seems to me that generation of these files is a more or less simple exercise in retrieving information from the J! databases and would like some guidance on the proper way to do it.
First off, here's the sitemaps.org specification of the sitemap protocol:
That said, here are the things I would hopefully like to learn:
1. How I can properly query an article's URL using PHP and/or SQL taking into account the three SEF settings in the Administration backend. Are there any specific switches/plugins that I have to query? This information goes into the <loc> tags.
2. Same as above with regard to the <lastmod> tags. How can I get the last modification date data from the database onto this tag?
3. What would be a proper use of the <changefreq> and <priority> tags? At this point I assume that I can use the variable of whether or not an article is published to the frontpage as being given a higher priority than other articles. I'd probably go with asking for a default priority value and have the articles published on the frontpage as having 0.1 higher priority.
4. How do I get the component to go through each and every item in the content table? Should I use some kind of recursive script and how do I tell it to stop when its gone through each article?
5. How do I take account of Section and Category descriptions? These are pages in itself but do they have the last modification date values as well?
6. How do I take account of the menu heirarchy set up on the site? What plugins/components/modules should I also call in my script and how do I call them in the first place?
Thanks in advance for any illumination answers and suggestions. Anyone who has some free time and is most probably more knowledgeable than me with PHP and SQL is also welcome to help me in this component.
I'm a beginner when it comes to SQL and PHP but I learn pretty quickly and have a ton of free time which is why I've decided to do this myself instead of waiting for someone else to make a new component.
What I'd like to do is to generate the sitemap.xml and urllist.txt files that are used by Google Sitemaps and Yahoo! Site Explorer that help websites get their content onto search engines. It seems to me that generation of these files is a more or less simple exercise in retrieving information from the J! databases and would like some guidance on the proper way to do it.
First off, here's the sitemaps.org specification of the sitemap protocol:
Code: Select all
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
1. How I can properly query an article's URL using PHP and/or SQL taking into account the three SEF settings in the Administration backend. Are there any specific switches/plugins that I have to query? This information goes into the <loc> tags.
2. Same as above with regard to the <lastmod> tags. How can I get the last modification date data from the database onto this tag?
3. What would be a proper use of the <changefreq> and <priority> tags? At this point I assume that I can use the variable of whether or not an article is published to the frontpage as being given a higher priority than other articles. I'd probably go with asking for a default priority value and have the articles published on the frontpage as having 0.1 higher priority.
4. How do I get the component to go through each and every item in the content table? Should I use some kind of recursive script and how do I tell it to stop when its gone through each article?
5. How do I take account of Section and Category descriptions? These are pages in itself but do they have the last modification date values as well?
6. How do I take account of the menu heirarchy set up on the site? What plugins/components/modules should I also call in my script and how do I call them in the first place?
Thanks in advance for any illumination answers and suggestions. Anyone who has some free time and is most probably more knowledgeable than me with PHP and SQL is also welcome to help me in this component.