Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xavier-nx fab 300 vs fab 301 #1692

Open
goetzdd opened this issue Sep 10, 2024 · 4 comments
Open

xavier-nx fab 300 vs fab 301 #1692

goetzdd opened this issue Sep 10, 2024 · 4 comments

Comments

@goetzdd
Copy link

goetzdd commented Sep 10, 2024

Hold the flaming but we're still using a tegrademo hardknott branch (up to date with the latest but yea 2 yr old) that has been successful for us, from tx2 on a custom carrier across to xavier-nx and nano. But today we go to bring up a couple more xavier-nx and while the devices flash successfully, and our custom rootfs, bits and pieces are there and running, there is no display. Our custom bootloader screen is displayed on screen, but there is no hdmi detected, no edid read and therefore nothing beyond the bootloader image.

On a hunch I checked the devices and the only difference is 8-0001-300 vs 8-0001-301 from the QR code. Looking back on the nvidia downloads page, there is a PCN for this change back in 2020, and it appears the hardknott branch should have already supported any differences. Which was supposedly limited to changing the memory, and memory appears to be ok since our app and rootfs all appear fine.

I think the relevant errors in dmesg are such:

[ 1.820229] tegra_cec 3960000.tegra_cec: dt=1 start=0x03960000 end=0x03960FFF irq=77
[ 1.820246] tegra_cec 3960000.tegra_cec: Unpowergate DISP: 0.
[ 1.820706] tegra_cec 3960000.tegra_cec: Enable clock result: 0.
[ 1.820760] tegra_cec 3960000.tegra_cec: tegra_cec_init started
[ 1.821271] tegra_cec 3960000.tegra_cec: cec_add_sysfs ret=0
[ 1.821281] tegra_cec 3960000.tegra_cec: probed
[ 1.822045] tegradc 15200000.nvdisplay: disp0 connected to head0->/host1x/sor1
[ 1.822170] generic_infoframe_type: 0x87
[ 1.822347] tegradc 15200000.nvdisplay: DT parsed successfully
[ 1.822426] tegradc 15200000.nvdisplay: Display dc.ffffff800d2d0000 registered with id=0
[ 1.832343] tegra_nvdisp_bandwidth_register_max_config: couldn't find valid max config!
[ 1.832585] tegradc 15200000.nvdisplay: failed to register ihub bw client
[ 1.833233] tegradc: probe of 15200000.nvdisplay failed with error -7
[ 1.834013] tegradc 15210000.nvdisplay: disp0 connected to head1->/host1x/sor
[ 1.834072] tegradc 15210000.nvdisplay: parse_dp_settings: No dp-lt-settings node
[ 1.834220] tegradc 15210000.nvdisplay: DT parsed successfully
[ 1.834341] tegradc 15210000.nvdisplay: Display dc.ffffff800d350000 registered with id=0
[ 1.835993] tegra_nvdisp_bandwidth_register_max_config: couldn't find valid max config!
[ 1.836182] tegradc 15210000.nvdisplay: failed to register ihub bw client
[ 1.836709] tegradc: probe of 15210000.nvdisplay failed with error -7

Looking those up on nvidia forums it appears there was a couple things suggested, but typically falls back on update to rel32.7.1 but there are some patches for the BSP suggested in one post.

https://forums.developer.nvidia.com/t/only-nvidia-logo-is-shown-after-flashing-cannot-complete-oem-config-setup-through-gui/222216

I'm trying to crank out a kirkstone version to check, since we will be hard stuck once the latest xavier-nx with yet another PCN for new memory ends up in our hands, but since hardknott theoretically should have already supported this possibly it is an issue that could be looked at. I'll update with kirkstone results as they come in, preferably I'm doing something dumb that is easily resolved but asking the question anyway. Thanks

@madisongh
Copy link
Member

Based on the commit message for 368edd0, support for FAB 301 didn't come in until L4T R32.6.1. The dev forum post you linked to also talks about R32.6.x/JetPack 4.6.

Hardknott stopped at L4T R32.5.2, so it's not going to support the newer FABs.

@goetzdd
Copy link
Author

goetzdd commented Sep 10, 2024

Sorry, I was referring to PCN206980 from May 2020, that refers to adding Hynix memory to the BOM and specified applying to 301 or later fabs. It lists required JetPack 4.4.1 / BSP 32.4.4 or later so I assumed since it was 32.5.2 it would have. If the BUPGEN doesn't have a 301 wouldn't I have issues flashing the boards?

@madisongh
Copy link
Member

If the BUPGEN doesn't have a 301 wouldn't I have issues flashing the boards?

Not necessarily. The older flashing tools would probably treat the 301 FAB like another one, not knowing that it should be handled differently.

@goetzdd
Copy link
Author

goetzdd commented Sep 11, 2024

I modified the relevant conf files in hardknott to add fab=301 etc, but I still have the same result. Though with more testing and looking even deeper it is more of a mystery than before. I got another unit out and I now have 4 of all the same info as read using flash.sh --read-info:

Board ID(3668) version(301) sku(0001) revision(G.0)

attached are two boots labeled working/non-working, shows the two modules boot. only weirdness I see is
"emc_init: Missing emc dt table!" and "emc_debugfs_t194_init: no EMC tables, aborting"

boot work.txt
boot-nowork.txt

Of the 4, 1 worked off the bat after flashing, and 3 didn't. Then I reenabled the console port and reflashed one of the non-working and viola it started working. So I'm like cool I must have fixed it and tried to re-flash the next and huh, still no for that one.. then I realized it seems to only flash completely the first time I extract the tar.gz... I'm thinking, no, can't be that can it? So I just repeated that this morning and surprise now #3 works...

So on the last one I flashed from the previously used tegraflash folder, which proved that a reflash from a used folder does not work, then extract the tar.gz again and one more re-flash proved the theory.

So this apparently isn't really an issue of fab=300 vs fab=301, there appear to be a few files that change on disk after a flash. most significant change is to tegra194-a02-bpmp-p3668-a00.dtb hence probably why the "missing emc dt table" shows up.

So for us just re-extracting the tar.gz makes it work every time, finding out why would be interesting but I probably won't be allocated the time to find out unfortunately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants