The question about the redcap algorithm #2467

JagnDC · 2023-11-16T06:12:46Z

I would like to ask a question about the REDCAP algorithm, I wrote the REDCAP algorithm according to the paper, but the calculation results seem to be different from the results of geoda, and some problems are found through comparison.

For FullOrder-AverageLinkage, my data property values are as follows:

[[6 5 4]
[8 9 3]
[5 9 0]
[8 7 8]
[4 7 1]]

The initial adjacency relation is as follows.

[[0.1.1.1.1.]
[1.0.1.1 1.0.]
[1.1.0.0.1.]
[1.1.0.0.0.]
[1.0.1.0.0.]]

I used this to cluster five nodes and set the number of clusters to two.
My code calculates the following:
{0: [0, 4, 1, 2], 1: [3]}
geoda calculates the following:
{0: [0, 1, 3], 1: [2, 4]}

I didn't know which of these calculations was correct, so I looked at the spanning tree, and I found that my algorithm built the following tree adjacency:

[[0.0.0.0.1.]
[0. 0. 1.
[0.1.0.0.1.]
[0.1. 0. 0.]
[1.0.1.0.0.]]

geoda's tree is:

0 5 "clusters" POLY_ID
3 5 6
5 3 6
1 2 21
21 21
1 5 17
5 1 17
1 4 24
4 1 24

After manual calculation, I found that according to my understanding, the results of the paper should be consistent with the results of my algorithm. May I ask if there is a problem with my calculation results? Or maybe there's something wrong in some other place that I don't understand. Thanks to the Author

The text was updated successfully, but these errors were encountered:

lixun910 · 2023-11-16T16:05:44Z

Thanks for your question! @JagnDC Can you post your step-by-step manual calculation so we can check what might be the problem? You can see a step-by-step manual calculation using FullOrder-CompleteLinkage in our GeoDa workbook: https://geodacenter.github.io/workbook/9c_spatial3/lab9c.html#appendix

JagnDC · 2023-11-22T13:16:16Z

Thank you for your reply. This is my example of the spanning tree construction process diagram. I don’t know if I calculated it correctly, but I checked it many times.
Here, I use the FullOrder-AverageLinkage

lixun910 · 2023-11-22T16:09:47Z

Can you also post how did you update the centroid (with average linkage) of the merged cluster after making a connection? I didn't see it in your attached image. You can also post the dendrogram of the hierarchical clustering of average linkage, which helps to explain the calculation (like in the GeoDa documentation I posted). Thanks!

JagnDC · 2023-11-23T08:31:15Z

Thank you for your reply.

First we need to update the result above. In the last step I made a mistake, but the conclusion remains the same.

In addition, here are the distance matrix of each step and the result of merging. I don't know if you can understand my meaning

JagnDC · 2023-11-23T09:03:17Z

By the way, when I set the number of clusters to 1, will the spanning tree I get be a complete spanning tree?

lixun910 · 2023-11-23T14:39:21Z

The calculation doesn't look correct to me: after merging 2 and 4, why all the sudden 3 has connection to 2?

Yes, It's better to post a screenshot of the dendrogram of SCHC, see in GeoDa documentation: Thanks!

A careful consideration of the various REDCAP algorithms reveals that they essentially consist of three steps. First, a dendrogram for contiguity constrained hierarchical clustering is constructed, using the given linkage function. This yields the exact same dendrogram as produced by SCHC. Next, this dendrogram is turned into a spanning tree, using standard graph manipulation principles. Finally, the optimal cuts in the spanning tree are obtained using the same logic (and computations) as in SKATER, up to the desired level of k.

JagnDC · 2023-11-24T03:04:25Z

Here's the dendrogram I drew and the SCHC dendrogram, and I've also labeled the order of the edges.

To be clear, I'm just building a spanning tree here.

For your question, you can see my clustering on the right side of the distance matrix diagram. Since 2 and 4 are merged into one cluster, cluster 2 in state 2 represents the set of cluster 2 and cluster 4 in state 1. When updating the distance between cluster 2 and other clusters in the calculation process, all distances are calculated. I devised a separate step to determine whether clusters are adjacent.

To avoid misunderstandings, I have updated the process plot of the distance matrix here.

lixun910 · 2023-11-24T03:29:30Z

Thanks for updating the screenshot! Yes, the SCHC represents the results in Redcap in GeoDa. In your state 2, how do you get 8.41861? since 3 is not connected to neither 2 or 4 from state 1. On Nov 23, 2023, at 8:04 PM, dingchen ***@***.***> wrote: [image]<https://urldefense.com/v3/__https://user-images.githubusercontent.com/20790537/285346690-6540311c-70b0-48bf-8e6b-45287f09d2b0.png__;!!BpyFHLRN4TMTrA!6b1peUbbLTzH4DmFKqM6021St7nGtLShyKTFD5deGFwYmmLKrRBjgP9BIswTVCOv_Sd1FNYjzJv1PmuIkfM2JrhiLA$> Here's the dendrogram I drew and the SCHC dendrogram, and I've also labeled the order of the edges. To be clear, I'm just building a spanning tree here. For your question, you can see my clustering on the right side of the distance matrix diagram. Since 2 and 4 are merged into one cluster, cluster 2 in state 2 represents the set of cluster 2 and cluster 4 in state 1. When updating the distance between cluster 2 and other clusters in the calculation process, all distances are calculated. I devised a separate step to determine whether clusters are adjacent. [image]<https://urldefense.com/v3/__https://user-images.githubusercontent.com/20790537/285347433-c9fd998f-e7ee-4a2c-8833-c240179fdeba.png__;!!BpyFHLRN4TMTrA!6b1peUbbLTzH4DmFKqM6021St7nGtLShyKTFD5deGFwYmmLKrRBjgP9BIswTVCOv_Sd1FNYjzJv1PmuIkfNSKOOmzg$> — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825093881__;Iw!!BpyFHLRN4TMTrA!6b1peUbbLTzH4DmFKqM6021St7nGtLShyKTFD5deGFwYmmLKrRBjgP9BIswTVCOv_Sd1FNYjzJv1PmuIkfNkNfW6aQ$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYTZ7EWYPQL7565UMSODYGAFEJAVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGA4TGOBYGE__;!!BpyFHLRN4TMTrA!6b1peUbbLTzH4DmFKqM6021St7nGtLShyKTFD5deGFwYmmLKrRBjgP9BIswTVCOv_Sd1FNYjzJv1PmuIkfNKae2lwg$>. You are receiving this because you commented.Message ID: ***@***.***>

JagnDC · 2023-11-24T04:38:41Z

Thank you for your reply. For your question, the average of all distances between two clusters I calculated here, because it is the calculation between feature values, so no judgment of adjacency is made, and constraints are made in the subsequent calculation process, so the final calculation results will not be affected.

However, I would like to thank you for reminding me. I have optimized my calculation process and reduced unnecessary calculation process.

lixun910 · 2023-11-24T05:27:30Z

You are welcome! Just noticed another one: the 4.57081 values in state 2 seems not correct. All these calculations will impact the final results. The SCHC is a special case of classic hierarchical clustering with fully connected weights. The implementation is identical to other clustering libraries. On Nov 23, 2023, at 9:38 PM, dingchen ***@***.***> wrote: Thank you for your reply. For your question, the average of all distances between two clusters I calculated here, because it is the calculation between feature values, so no judgment of adjacency is made, and constraints are made in the subsequent calculation process, so the final calculation results will not be affected. However, I would like to thank you for reminding me. I have optimized my calculation process and reduced unnecessary calculation process. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825138841__;Iw!!BpyFHLRN4TMTrA!9awoiDVpcRBA_hp4zAdo48tEoQlB8Sm5cFrUEEYBD_Aj8z1T8sEVcX7fGhMP-SQEABJcZrrVVNu2C49KFQDzWQ2h0g$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYT7MAXCT3DEDLDZNS23YGAQFXAVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGEZTQOBUGE__;!!BpyFHLRN4TMTrA!9awoiDVpcRBA_hp4zAdo48tEoQlB8Sm5cFrUEEYBD_Aj8z1T8sEVcX7fGhMP-SQEABJcZrrVVNu2C49KFQBNv82aTA$>. You are receiving this because you commented.Message ID: ***@***.***>

JagnDC · 2023-11-24T05:44:46Z

Thank you for your reply. I see what you mean. I want to briefly explain my calculation process.

According to the full pattern in the original article, when I merge nodes 2,4 in state 1, I should calculate the distance between the other clusters, which are cluster 0 and cluster 1. You are confused about the distance between cluster 1 and cluster 2

dis(node 1, node 2)=√(9+0+9)=√18
dis(node 1, node 4)= √(16+4+4)=√24
avgdis(cluster 1, cluster 2)= (√18+ √24)/2=4.5708

dis(node 0, node 1)=√21
avgdis(cluster 0, cluster 2)= √21/1=4.5826

avgdis(cluster 1, cluster 2) < avgdis(cluster 0, cluster 2)

lixun910 · 2023-11-24T06:14:28Z

I am confused: among node 1-5, you are merging node 3 and 5 at first step from your screenshot. This is corresponding to row/column 2 and 4 in state 0 table. It should be simply the average of the two columns/rows. In state 1 table, 4.93383 is correct, and 4.57081 is not correct and some others are not correct as well. Please check and state carefully. On Nov 23, 2023, at 10:45 PM, dingchen ***@***.***> wrote: Thank you for your reply. I see what you mean. I want to briefly explain my calculation process. According to the full pattern in the original article, when I merge nodes 2,4 in state 1, I should calculate the distance between the other clusters, which are cluster 0 and cluster 1. You are confused about the distance between cluster 1 and cluster 2 dis(node 1, node 2)=√(9+0+9)=√18 dis(node 1, node 4)= √(16+4+4)=√24 avgdis(cluster 1, cluster 2)= √18+ √24=4.5708 — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825175890__;Iw!!BpyFHLRN4TMTrA!81WITMV5Zf3pokGKGavnV60tbxBtaJ6jlH1cnZLAcBau5QDZh39eAXglly0_aiytkw654W1W_6B3jAL5CNIU3EvlcA$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYT3U2CNQVILOWJR4HSDYGAX5RAVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGE3TKOBZGA__;!!BpyFHLRN4TMTrA!81WITMV5Zf3pokGKGavnV60tbxBtaJ6jlH1cnZLAcBau5QDZh39eAXglly0_aiytkw654W1W_6B3jAL5CNIdLZ6Brg$>. You are receiving this because you commented.Message ID: ***@***.***>

JagnDC · 2023-11-24T06:37:55Z

Thank you for your reply. I think my expression is not clear enough, so I updated the graph of distance matrix and refer to the format of your GeoDa documentation.
As you can see in this graph, 4.57081 is the average of 4.2426 and 4.8989. I have repeated the calculation here many times, I think it should be no problem.

lixun910 · 2023-11-24T06:44:11Z

This is not correct: the state 0 table has values on all non-diagonal cells, which means each node connects with every other nodes. Please check carefully. On Nov 23, 2023, at 11:38 PM, dingchen ***@***.***> wrote: [image]<https://urldefense.com/v3/__https://user-images.githubusercontent.com/20790537/285377563-37dac79d-9189-4352-9f2a-d95ae3acf465.png__;!!BpyFHLRN4TMTrA!--KgXj2u4tSk03Hw8mgnXCg1dSpt5JMx7ovz8sJt7eS-ThZ4Jm5iZ6P16thOeRKJQa12PxSN1hG0RoDEv2f_EHL5-w$> Thank you for your reply. I think my expression is not clear enough, so I updated the graph of distance matrix and refer to the format of your GeoDa documentation. As you can see in this graph, 4.57081 is the average of 4.2426 and 4.8989. I have repeated the calculation here many times, I think it should be no problem. — Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/GeoDaCenter/geoda/issues/2467*issuecomment-1825210514__;Iw!!BpyFHLRN4TMTrA!--KgXj2u4tSk03Hw8mgnXCg1dSpt5JMx7ovz8sJt7eS-ThZ4Jm5iZ6P16thOeRKJQa12PxSN1hG0RoDEv2crc7rujA$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AASPYT2A5OTZNIT3KTH365DYGA6E5AVCNFSM6AAAAAA7NTOWP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRVGIYTANJRGQ__;!!BpyFHLRN4TMTrA!--KgXj2u4tSk03Hw8mgnXCg1dSpt5JMx7ovz8sJt7eS-ThZ4Jm5iZ6P16thOeRKJQa12PxSN1hG0RoDEv2c7Qyj-TA$>. You are receiving this because you commented.Message ID: ***@***.***>

JagnDC · 2023-11-24T06:47:42Z

Thank you for your reply. What I refer to here is the composition of this matrix diagram in the GeoDa documentation. What I marked in the red box is the distance of adjacency relationship, the non-red box is the non-adjacency, and the yellow background color is the minimum distance of adjacent nodes as the merge target.

lixun910 · 2023-11-24T07:18:19Z

I see. Can you send me the files you used in GeoDa? So I can check the details. Thanks!

JagnDC · 2023-11-24T07:33:18Z

test_data.zip
Thank you very much for your attention and answer. Here is my data, where feature1 and feature2 are the location coordinates, feature3-5 are the feature columns used, and clusters is the result of my clustering.

Feel free to contact me if you have any new findings or questions!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The question about the redcap algorithm #2467

The question about the redcap algorithm #2467

JagnDC commented Nov 16, 2023

lixun910 commented Nov 16, 2023 •

edited

Loading

JagnDC commented Nov 22, 2023 •

edited

Loading

lixun910 commented Nov 22, 2023

JagnDC commented Nov 23, 2023

JagnDC commented Nov 23, 2023

lixun910 commented Nov 23, 2023

JagnDC commented Nov 24, 2023 •

edited

Loading

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023 •

edited

Loading

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023

JagnDC commented Nov 24, 2023

The question about the redcap algorithm #2467

The question about the redcap algorithm #2467

Comments

JagnDC commented Nov 16, 2023

lixun910 commented Nov 16, 2023 • edited Loading

JagnDC commented Nov 22, 2023 • edited Loading

lixun910 commented Nov 22, 2023

JagnDC commented Nov 23, 2023

JagnDC commented Nov 23, 2023

lixun910 commented Nov 23, 2023

JagnDC commented Nov 24, 2023 • edited Loading

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023 • edited Loading

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023 via email

JagnDC commented Nov 24, 2023

lixun910 commented Nov 24, 2023

JagnDC commented Nov 24, 2023

lixun910 commented Nov 16, 2023 •

edited

Loading

JagnDC commented Nov 22, 2023 •

edited

Loading

JagnDC commented Nov 24, 2023 •

edited

Loading

JagnDC commented Nov 24, 2023 •

edited

Loading